覆盖

想想数字信号处理(DSP)。

Think DSP

Python中的数字信号处理

Digital Signal Processing in Python

艾伦·B·唐尼

Allen B. Downey

想想数字信号处理(DSP)。

Think DSP

作者:艾伦· 唐尼

by Allen B. Downey

美国印刷。

Printed in the United States of America.

O'Reilly Media, Inc.出版,地址:1005 Gravenstein Highway North, Sebastopol, CA 95472。

Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.

O'Reilly 出版的图书可用于教育、商业或销售推广用途。大多数图书也提供在线版本(http://safaribooksonline.com)。欲了解更多信息,请联系我们的企业/机构销售部门:800-998-9938 或corporate@oreilly.com

O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com.

  • 编辑:南·巴伯和苏珊·康南特
  • Editors: Nan Barber and Susan Conant
  • 制作编辑:克里斯汀·布朗
  • Production Editor: Kristen Brown
  • 文字编辑:金·科弗
  • Copyeditor: Kim Cofer
  • 校对员:瑞秋·海德
  • Proofreader: Rachel Head
  • 索引员:艾伦·B·唐尼
  • Indexer: Allen B. Downey
  • 室内设计师:大卫·富塔托
  • Interior Designer: David Futato
  • 封面设计:凯伦·蒙哥马利
  • Cover Designer: Karen Montgomery
  • 插画师:丽贝卡·德马雷斯特
  • Illustrator: Rebecca Demarest
  • 2016年7月:第一版
  • July 2016: First Edition

第一版修订历史

Revision History for the First Edition

  • 2016年7月11日:首次发布
  • 2016-07-11: First Release

有关发行详情,请访问http://oreilly.com/catalog/errata.csp?isbn=9781491938454 。

See http://oreilly.com/catalog/errata.csp?isbn=9781491938454 for release details.

前言

Preface

信号处理是我最喜欢的课题之一。它在科学和工程的许多领域都很有用,如果你理解了它的基本概念,就能更好地理解我们在世界上看到的许多事物,尤其是我们听到的声音。

Signal processing is one of my favorite topics. It is useful in many areas of science and engineering, and if you understand the fundamental ideas, it provides insight into many things we see in the world, and especially the things we hear.

但除非你学的是电气工程或机械工程,否则你可能没有机会接触信号处理。问题在于,大多数书籍(以及使用这些书籍的课程)都是从底层开始讲解,从相量等数学抽象概念入手。而且它们往往过于理论化,应用较少,缺乏实际意义。

But unless you’ve studied electrical or mechanical engineering, you probably haven’t had a chance to learn about signal processing. The problem is that most books (and the classes that use them) present the material bottom-up, starting with mathematical abstractions like phasors. And they tend to be theoretical, with few applications and little apparent relevance.

本书的前提是,如果你会编程,你就可以利用这项技能去学习其他东西,并在学习过程中获得乐趣。

The premise of this book is that if you know how to program, you can use that skill to learn other things, and have fun doing it.

通过编程的方式,我可以立即呈现最重要的概念。读完第一章,你就能分析录音和其他信号,并生成新的声音。每一章都会介绍一种新技术以及一个可以应用于实际信号的应用实例。每一步,你都会先学习如何使用这项技术,然后再了解它的工作原理。

With a programming-based approach, I can present the most important ideas right away. By the end of the first chapter, you’ll be able to analyze sound recordings and other signals, and generate new sounds. Each chapter introduces a new technique and an application you can apply to real signals. At each step you learn how to use a technique first, and then how it works.

这种方法更实用,而且,我希望您也会同意,更有趣。

This approach is more practical and, I hope you’ll agree, more fun.

这本书适合哪些读者?

Who Is This Book For?

本书的示例和配套代码均使用 Python 编写。您应该掌握 Python 核心知识,并熟悉面向对象特性,至少要熟悉如何使用对象,即使您不会定义自己的对象。

The examples and supporting code for this book are in Python. You should know core Python and you should be familiar with object-oriented features, at least using objects if not defining your own.

如果您还不熟悉 Python,您可以先阅读我的另一本书《Think Python》,这是一本面向零基础用户的 Python 入门书籍;或者阅读 Mark Lutz 的《Learning Python》,这本书可能更适合有编程经验的人。

If you are not already familiar with Python, you might want to start with my other book, Think Python, which is an introduction to Python for people who have never programmed, or Mark Lutz’s Learning Python, which might be better for people with programming experience.

我大量使用 NumPy 和 SciPy。如果您已经熟悉它们,那就太好了,但我也会解释我使用的函数和数据结构。

I use NumPy and SciPy extensively. If you are familiar with them already, that’s great, but I will also explain the functions and data structures I use.

我假设读者具备基本的数学知识,包括复数。你不需要掌握太多微积分知识;只要理解积分和微分的概念就足够了。我会用到一些线性代数,但会在讲解过程中进行解释。

I assume that the reader knows basic mathematics, including complex numbers. You don’t need much calculus; if you understand the concepts of integration and differentiation, that will do. I use some linear algebra, but I will explain it as we go along.

使用代码

Using the Code

本书中使用的代码和音频样本可从此 GitHub 代码库获取:https://github.com/AllenDowney/ThinkDSP。如果您不熟悉 Git 和 GitHub,Git 是一个版本控制系统,可用于跟踪构成项目的文件。Git 控制下的文件集合称为“代码库”。GitHub 是一个托管服务,为 Git 代码库提供存储空间和一个便捷的 Web 界面。

The code and sound samples used in this book are available from this GitHub repository: https://github.com/AllenDowney/ThinkDSP. If you are not familiar with Git and GitHub, Git is a version control system that allows you to keep track of the files that make up a project. A collection of files under Git’s control is called a “repository”. GitHub is a hosting service that provides storage for Git repositories and a convenient web interface.

我的GitHub仓库主页提供了几种使用代码的方式:

The GitHub home page for my repository provides several ways to work with the code:

  • 你可以通过点击“Fork”按钮在 GitHub 上创建我的代码仓库副本。如果你还没有 GitHub 账号,需要先创建一个。Fork 之后,你将在 GitHub 上拥有自己的代码仓库,可以用来记录你在编写本书代码时所做的修改。之后,你可以克隆这个代码仓库,也就是将文件复制到你的电脑上。

  • You can create a copy of my repository on GitHub by pressing the Fork button. If you don’t already have a GitHub account, you’ll need to create one. After forking, you’ll have your own repository on GitHub that you can use to keep track of code you write while working on this book. Then you can clone the repository, which means that you copy the files to your computer.

  • 你可以克隆我的仓库。你不需要GitHub账号就能做到这一点,但你将无法将你的更改写回GitHub。

  • You can clone my repository. You don’t need a GitHub account to do this, but you won’t be able to write your changes back to GitHub.

  • 如果您完全不想使用 Git,可以使用 GitHub 页面右下角的按钮将文件下载到 ZIP 文件中。

  • If you don’t want to use Git at all, you can download the files in a ZIP file using the button in the lower-right corner of the GitHub page.

所有代码均编写为可在 Python 2 和 Python 3 中运行,无需任何转换。

All of the code is written to work in both Python 2 and Python 3 with no translation.

我使用 Continuum Analytics 的 Anaconda 开发了这本书。Anaconda 是一个免费的 Python 发行版,包含了运行代码所需的所有软件包(以及更多其他软件包)。我发现 Anaconda 安装起来非常方便。默认情况下,它进行的是用户级安装,而不是系统级安装,因此您无需管理员权限。而且它同时支持 Python 2 和 Python 3。您可以从http://continuum.io/downloads下载 Anaconda 。

I developed this book using Anaconda from Continuum Analytics, which is a free Python distribution that includes all the packages you’ll need to run the code (and lots more). I found Anaconda easy to install. By default it does a user-level installation, not system-level, so you don’t need administrative privileges. And it supports both Python 2 and Python 3. You can download Anaconda from http://continuum.io/downloads.

如果您不想使用 Anaconda,则需要以下软件包:

If you don’t want to use Anaconda, you will need the following packages:

虽然这些软件包很常用,但并非所有 Python 安装包都包含它们,而且在某些环境下安装起来可能比较困难。如果您在安装过程中遇到问题,我建议您使用 Anaconda 或其他包含这些软件包的 Python 发行版。

Although these are commonly used packages, they are not included with all Python installations, and they can be hard to install in some environments. If you have trouble installing them, I recommend using Anaconda or one of the other Python distributions that include these packages.

大多数练习使用 Python 脚本,但也有一些使用 Jupyter notebook。如果您之前没有使用过 Jupyter,可以访问http://jupyter.org了解更多信息。

Most exercises use Python scripts, but some also use Jupyter notebooks. If you have not used Jupyter before, you can read about it at http://jupyter.org.

使用 Jupyter Notebook 有三种方法:

There are three ways you can work with the Jupyter notebooks:

在你的电脑上运行 Jupyter
Run Jupyter on your computer

如果您安装了 Anaconda,那么 Jupyter 很可能默认就存在。要检查这一点,请从命令行启动服务器,如下所示:

$ jupyter notebook

如果尚未安装,您可以通过以下方式在 Anaconda 中安装:

$ conda install jupyter

启动服务器时,它应该会启动您的默认网络浏览器,或者在打开的浏览器窗口中创建一个新标签页。

If you installed Anaconda, you probably got Jupyter by default. To check, start the server from the command line, like this:

$ jupyter notebook

If it’s not installed, you can install it in Anaconda like this:

$ conda install jupyter

When you start the server, it should launch your default web browser or create a new tab in an open browser window.

在 Binder 上运行 Jupyter
Run Jupyter on Binder

Binder 是一项在虚拟机中运行 Jupyter 的服务。如果您访问链接http://mybinder.org/repo/AllenDowney/ThinkDSP,您应该会看到一个 Jupyter 主页,其中包含本书的 notebook 以及支持数据和脚本。

您可以运行这些脚本并对其进行修改以运行您自己的代码,但您运行的虚拟机是临时的。如果您让虚拟机闲置超过大约一个小时,您所做的任何更改以及虚拟机本身都会消失。

Binder is a service that runs Jupyter in a virtual machine. If you follow the link http://mybinder.org/repo/AllenDowney/ThinkDSP, you should get a Jupyter home page with the notebooks for this book and the supporting data and scripts.

You can run the scripts and modify them to run your own code, but the virtual machine you run in is temporary. Any changes you make will disappear, along with the virtual machine, if you leave it idle for more than about an hour.

在 nbviewer 上查看笔记本
View notebooks on nbviewer

本书后续章节中提到的 Jupyter Notebook 均提供 nbviewer 的链接,该工具可以静态显示代码和运行结果。您可以通过这些链接阅读 Notebook 并收听示例,但无法修改或运行代码,也无法使用交互式控件。

When I refer to notebooks later in the book, I provide links to nbviewer, which provides a static view of the code and results. You can use these links to read the notebooks and listen to the examples, but you won’t be able to modify or run the code, or use the interactive widgets.

祝你好运,玩得开心!

Good luck, and have fun!

本书中使用的约定

Conventions Used in This Book

本书采用以下排版规范:

The following typographical conventions are used in this book:

斜体
Italic

指示重点、按键、菜单选项、网址和电子邮件地址。

Indicates emphasis, keystrokes, menu options, URLs, and email addresses.

大胆的
Bold

用于定义新术语。

Used for new terms where they are defined.

Constant width
Constant width

用于程序清单,以及在段落中引用文件名、文件扩展名和程序元素,例如变量名、函数名、数据类型、语句和关键字。

Used for program listings, as well as within paragraphs to refer to filenames, file extensions, and program elements such as variable and function names, data types, statements, and keywords.

Constant width bold
Constant width bold

显示用户需要逐字输入的命令或其他文本。

Shows commands or other text that should be typed literally by the user.

Safari® 在线图书

Safari® Books Online

笔记

Safari Books Online ( www.safaribooksonline.com ) 是一个按需数字图书馆,提供来自世界领先的科技和商业领域作者的书籍和视频形式的专家内容。

Safari Books Online (www.safaribooksonline.com) is an on-demand digital library that delivers expert content in both book and video form from the world’s leading authors in technology and business.

技术专业人员、软件开发人员、网页设计师以及商业和创意专业人员将 Safari Books Online 作为他们进行研究、解决问题、学习和认证培训的主要资源。

Technology professionals, software developers, web designers, and business and creative professionals use Safari Books Online as their primary resource for research, problem solving, learning, and certification training.

Safari Books Online为企业政府教育机构和个人提供一系列套餐和定价方案。

Safari Books Online offers a range of plans and pricing for enterprise, government, education, and individuals.

会员可通过一个可全文检索的数据库,访问来自 O'Reilly Media、Prentice Hall Professional、Addison-Wesley Professional、Microsoft Press、Sams、Que、Peachpit Press、Focal Press、Cisco Press、John Wiley & Sons、Syngress、Morgan Kaufmann、IBM Redbooks、Packt、Adobe Press、FT Press、Apress、Manning、New Riders、McGraw-Hill、Jones & Bartlett、Course Technology 等数百家出版社的数千本图书、培训视频和预出版稿件欲了解更多关于 Safari Books Online 的信息,请访问我们的网站

Members have access to thousands of books, training videos, and prepublication manuscripts in one fully searchable database from publishers like O’Reilly Media, Prentice Hall Professional, Addison-Wesley Professional, Microsoft Press, Sams, Que, Peachpit Press, Focal Press, Cisco Press, John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, Course Technology, and hundreds more. For more information about Safari Books Online, please visit us online.

如何联系我们

How to Contact Us

请将有关本书的意见和问题直接发送给出版社:

Please address comments and questions concerning this book to the publisher:

  • 奥莱利传媒公司
  • O’Reilly Media, Inc.
  • 格雷文斯坦公路北段1005号
  • 1005 Gravenstein Highway North
  • 加利福尼亚州塞巴斯托波尔,邮编 95472
  • Sebastopol, CA 95472
  • 800-998-9938(美国或加拿大境内)
  • 800-998-9938 (in the United States or Canada)
  • 707-829-0515(国际或本地)
  • 707-829-0515 (international or local)
  • 707-829-0104(传真)
  • 707-829-0104 (fax)

我们为这本书设立了一个网页,上面列出了勘误、示例和任何其他信息。您可以通过http://bit.ly/think-dsp访问此页面。

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at http://bit.ly/think-dsp.

要对本书发表评论或提出技术问题,请发送电子邮件至

To comment or ask technical questions about this book, send email to .

有关我们的书籍、课程、会议和新闻的更多信息,请访问我们的网站http://www.oreilly.com

For more information about our books, courses, conferences, and news, see our website at http://www.oreilly.com.

在 Facebook 上找到我们:http://facebook.com/oreilly

Find us on Facebook: http://facebook.com/oreilly

请在Twitter上关注我们:http://twitter.com/oreillymedia

Follow us on Twitter: http://twitter.com/oreillymedia

请在 YouTube 上观看我们:http://www.youtube.com/oreillymedia

Watch us on YouTube: http://www.youtube.com/oreillymedia

贡献者列表

Contributor List

如果您有任何建议或更正,请发送电子邮件至downey@allendowney.com。如果我根据您的反馈进行更改,我会将您添加到贡献者名单中(除非您要求将其删除)。

If you have a suggestion or correction, please send email to downey@allendowney.com. If I make a change based on your feedback, I will add you to the contributor list (unless you ask to be omitted).

如果您能提供至少一部分错误句子的内容,将有助于我查找错误。页码和章节标题也可以,但处理起来不如句子内容方便。谢谢!

If you include at least part of the sentence the error appears in, that makes it easy for me to search. Page numbers and section titles are fine, too, but not as easy to work with. Thanks!

  • 在开始写作之前,我与谷歌的 Boulos Harb 和 Harmonix Music Systems 的前员工 Aurelio Ramos 的谈话,让我对这本书有了更深入的了解。

  • Before I started writing, my thoughts about this book benefited from conversations with Boulos Harb at Google and Aurelio Ramos, formerly at Harmonix Music Systems.

  • 2013 年秋季学期,Nathan Lintz 和 Ian Daniher 与我一起完成了一个独立研究项目,并帮助我完成了本书的初稿。

  • During the Fall 2013 semester, Nathan Lintz and Ian Daniher worked with me on an independent study project and helped me with the first draft of this book.

  • 在Reddit的DSP论坛上,匿名用户RamjetSoundwave帮我解决了布朗噪声实现中的一个问题。而andodli则发现了一个拼写错误。

  • On Reddit’s DSP forum, the anonymous user RamjetSoundwave helped me fix a problem with my implementation of Brownian noise. And andodli found a typo.

  • 2015年春季,我有幸与奥斯卡·穆尔-米兰达教授和悉达多·戈文达萨米教授一起教授这门课程。两位教授都提出了许多建议和修改意见。

  • In Spring 2015 I had the pleasure of teaching this material along with Prof. Oscar Mur-Miranda and Prof. Siddhartan Govindasamy. Both made many suggestions and corrections.

  • 西拉斯·吉格尔纠正了一个算术错误。

  • Silas Gyger corrected an arithmetic error.

  • 朱塞佩·马塞蒂提出了许多非常有用的建议。

  • Giuseppe Masetti sent a number of very helpful suggestions.

特别感谢技术审阅员 Eric Peters、Bruce Levens 和 John Vincent,他们提出了许多有益的建议、澄清和更正。

Special thanks to the technical reviewers, Eric Peters, Bruce Levens, and John Vincent, for many helpful suggestions, clarifications, and corrections.

还要感谢 Freesound,本书中使用的许多音频样本都来自这里,以及所有贡献这些样本的 Freesound 用户。我已将部分音频文件(使用原始文件名)添加到本书的 GitHub 代码库中,方便读者查找。

Also thanks to Freesound, which is the source of many of the sound samples I use in this book, and to the Freesound users who contributed those samples. I include some of their wave files in the GitHub repository for this book, using the original filenames, so it should be easy to find their sources.

遗憾的是,大多数 Freesound 用户都没有公开真实姓名,所以我只能用他们的用户名来感谢他们。本书中使用的采样由 Freesound 用户 iluppai、wcfl10、thirsk、docquesting、kleeb、landup、zippi1、themusicalnomad、bcjordan、rockwehrmann、marcgascon7 和 jcveliz 提供。感谢大家!

Unfortunately, most Freesound users don’t make their real names available, so I can only thank them by their usernames. Samples used in this book were contributed by Freesound users iluppai, wcfl10, thirsk, docquesting, kleeb, landup, zippi1, themusicalnomad, bcjordan, rockwehrmann, marcgascon7, and jcveliz. Thank you all!

第一章声音和信号

Chapter 1. Sounds and Signals

信号代表随时间变化的量。这个定义相当抽象,所以我们先来看一个具体的例子:声音。声音是气压的变化。声音信号代表气压随时间的变化

A signal represents a quantity that varies in time. That definition is pretty abstract, so let’s start with a concrete example: sound. Sound is variation in air pressure. A sound signal represents variations in air pressure over time.

麦克风是一种测量这些变化并产生代表声音的电信号的装置。扬声器是一种接收电信号并产生声音的装置。麦克风和扬声器被称为换能器,因为它们可以将信号从一种形式转换成另一种形式。

A microphone is a device that measures these variations and generates an electrical signal that represents sound. A speaker is a device that takes an electrical signal and produces sound. Microphones and speakers are called transducers because they transduce, or convert, signals from one form to another.

本书探讨的是信号处理,包括信号的合成、变换和分析等过程。我将重点介绍声音信号,但同样的方法也适用于电子信号、机械振动以及许多其他领域的信号。

This book is about signal processing, which includes processes for synthesizing, transforming, and analyzing signals. I will focus on sound signals, but the same methods apply to electronic signals, mechanical vibration, and signals in many other domains.

它们也适用于随空间而非时间变化的信号,例如徒步路线上的海拔高度。它们也适用于多维信号,例如图像,你可以将其视为在二维空间中变化的信号。或者电影,它是一种在二维空间时间中变化的信号。

They also apply to signals that vary in space rather than time, like elevation along a hiking trail. And they apply to signals in more than one dimension, like an image, which you can think of as a signal that varies in two-dimensional space. Or a movie, which is a signal that varies in two-dimensional space and time.

但我们先从简单的一维声音开始。

But we start with simple one-dimensional sound.

本章的代码chap01.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp01查看它。

The code for this chapter is in chap01.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp01.

周期信号

Periodic Signals

我们将从周期信号开始,周期信号是指每隔一段时间重复出现的信号。例如,如果你敲击一个铃铛,它会振动并发出声音。如果你记录下这个声音并绘制出转换后的信号,它看起来就像图 1-1所示。

We’ll start with periodic signals, which are signals that repeat themselves after some period of time. For example, if you strike a bell, it vibrates and generates sound. If you record that sound and plot the transduced signal, it looks like Figure 1-1.

图 1-1.铃声录音片段。

该信号类似于正弦波,这意味着它的形状与三角函数的正弦函数相同。

This signal resembles a sinusoid, which means it has the same shape as the trigonometric sine function.

你可以看到这个信号是周期性的。我选择的持续时间足以显示三个完整的重复周期,也就是三个循环。每个循环的持续时间,称为周期,约为 2.3 毫秒。

You can see that this signal is periodic. I chose the duration to show three full repetitions, also known as cycles. The duration of each cycle, called the period, is about 2.3 ms.

信号的频率是指每秒的周期数,它是周期的倒数。频率的单位是每秒周期数,或赫兹(Hz)。(严格来说,周期数是一个无量纲数,所以赫兹实际上是“每秒”。)

The frequency of a signal is the number of cycles per second, which is the inverse of the period. The units of frequency are cycles per second, or Hertz, abbreviated “Hz”. (Strictly speaking, the number of cycles is a dimensionless number, so a Hertz is really a “per second”.)

该信号的频率约为 439 Hz,略低于 440 Hz,后者是管弦乐的标准音高。该音符的音乐名称是 A,更准确地说是 A4。如果您不熟悉“科学音高记谱法”,数字后缀表示该音符所在的八度。A4 是中央 C 上方的 A。A5 比 A4 高一个八度。参见http://en.wikipedia.org/wiki/Scientific_pitch_notation

The frequency of this signal is about 439 Hz, slightly lower than 440 Hz, which is the standard tuning pitch for orchestral music. The musical name of this note is A, or more specifically, A4. If you are not familiar with “scientific pitch notation”, the numerical suffix indicates which octave the note is in. A4 is the A above middle C. A5 is one octave higher. See http://en.wikipedia.org/wiki/Scientific_pitch_notation.

音叉产生正弦波,是因为其叉齿的振动是一种简谐运动。大多数乐器产生周期性信号,但这些信号的形状并非正弦波。例如,图 1-2显示了一段小提琴演奏博凯里尼 E 大调第五弦乐五重奏第三乐章的录音片段。

A tuning fork generates a sinusoid because the vibration of the tines is a form of simple harmonic motion. Most musical instruments produce periodic signals, but the shape of these signals is not sinusoidal. For example, Figure 1-2 shows a segment from a recording of a violin playing Boccherini’s String Quintet No. 5 in E, 3rd movement.

图 1-2.小提琴录音片段。

我们再次看到,该信号是周期性的,但其波形更为复杂。周期信号的形状称为波形大多数乐器产生的波形都比正弦波复杂。波形的形状决定了音乐的音色,也就是我们对声音品质的感知。人们通常认为复杂的波形比正弦波更丰富、更温暖、更有趣。

Again we can see that the signal is periodic, but the shape of the signal is more complex. The shape of a periodic signal is called the waveform. Most musical instruments produce waveforms more complex than a sinusoid. The shape of the waveform determines the musical timbre, which is our perception of the quality of the sound. People usually perceive complex waveforms as rich, warm, and more interesting than sinusoids.

光谱分解

Spectral Decomposition

本书最重要的主题是频谱分解,其思想是任何信号都可以表示为不同频率的正弦波之和。

The most important topic in this book is spectral decomposition, which is the idea that any signal can be expressed as the sum of sinusoids with different frequencies.

本书最重要的数学概念是离散傅里叶变换(DFT),它接收一个信号并生成其频谱。频谱是由构成该信号的正弦波组成的集合。

The most important mathematical idea in this book is the Discrete Fourier Transform (DFT), which takes a signal and produces its spectrum. The spectrum is the set of sinusoids that add up to produce the signal.

本书中最重要的算法是快速傅里叶变换(FFT),它是计算DFT的一种有效方法。

And the most important algorithm in this book is the Fast Fourier Transform (FFT), which is an efficient way to compute the DFT.

例如,图 1-3显示了图 1-2中小提琴录音的频谱。x 轴表示构成信号的频率范围。y 轴表示每个频率分量的强度或振幅。

For example, Figure 1-3 shows the spectrum of the violin recording in Figure 1-2. The x-axis is the range of frequencies that make up the signal. The y-axis shows the strength or amplitude of each frequency component.

图 1-3.小提琴录音片段的频谱。

最低频率分量称为基频。该信号的基频接近 440 Hz(实际上略低一些,或者说“平坦”)。

The lowest frequency component is called the fundamental frequency. The fundamental frequency of this signal is near 440 Hz (actually a little lower, or “flat”).

在这个信号中,基频的振幅最大,因此它也是主导频率。通常情况下,声音的感知音调是由基频决定的,即使它并非主导频率。

In this signal the fundamental frequency has the largest amplitude, so it is also the dominant frequency. Normally the perceived pitch of a sound is determined by the fundamental frequency, even if it is not dominant.

频谱中的其他峰值频率分别为 880、1320、1760 和 2200,这些都是基频的整数倍。这些分量被称为谐波,因为它们在音乐上与基频和谐一致。

The other spikes in the spectrum are at frequencies 880, 1320, 1760, and 2200, which are integer multiples of the fundamental. These components are called harmonics because they are musically harmonious with the fundamental:

  • 880 是 A5 的频率,比基音高一个八度。一个八度是指频率翻倍。

  • 880 is the frequency of A5, one octave higher than the fundamental. An octave is a doubling in frequency.

  • 1320 大约是 E6,比 A5 高一个纯五度。如果您不熟悉“纯五度”之类的音程,请参阅https://en.wikipedia.org/wiki/Interval_(music)

  • 1320 is approximately E6, which is a perfect fifth above A5. If you are not familiar with musical intervals like “perfect fifth”, see https://en.wikipedia.org/wiki/Interval_(music).

  • 1760 是 A6,比基音高两个八度。

  • 1760 is A6, two octaves above the fundamental.

  • 2200 大约是C♯7,比 A6 高一个大三度。

  • 2200 is approximately C7, which is a major third above A6.

这些泛音构成了A大调和弦的音符,尽管它们并非都在同一个八度音程内。其中一些只是近似的,因为构成西方音乐的音符已经根据十二平均律进行了调整(参见http://en.wikipedia.org/wiki/Equal_temperament)。

These harmonics make up the notes of an A major chord, although not all in the same octave. Some of them are only approximate because the notes that make up Western music have been adjusted for equal temperament (see http://en.wikipedia.org/wiki/Equal_temperament).

已知谐波及其振幅,就可以通过叠加正弦波来重构信号。接下来我们将看到具体方法。

Given the harmonics and their amplitudes, you can reconstruct the signal by adding up sinusoids. Next we’ll see how.

信号

Signals

我编写了一个名为 `<python_module_name>` 的 Python 模块thinkdsp.py,其中包含用于处理信号和频谱的类和函数1。您可以在本书的存储库中找到它(请参阅“使用代码”)。

I wrote a Python module called thinkdsp.py that contains classes and functions for working with signals and spectrums1. You will find it in the repository for this book (see “Using the Code”).

为了表示信号,thinkdsp提供了一个名为 的类Signal。这是几种信号类型的父类,包括Sinusoid,它同时表示正弦信号和余弦信号。

To represent signals, thinkdsp provides a class called Signal. This is the parent class for several signal types, including Sinusoid, which represents both sine and cosine signals.

thinkdsp提供生成正弦和余弦信号的函数:

thinkdsp provides functions to create sine and cosine signals:

cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)
sin_sig = thinkdsp.SinSignal(freq=880,amp=0.5,offset=0)
cos_sig = thinkdsp.CosSignal(freq=440, amp=1.0, offset=0)
sin_sig = thinkdsp.SinSignal(freq=880, amp=0.5, offset=0)

freq是频率,单位为赫兹。amp是振幅,单位未指定,其中 1.0 定义为我们可以记录或播放的最大振幅。

freq is frequency in Hz. amp is amplitude in unspecified units, where 1.0 is defined as the largest amplitude we can record or play back.

offset是以弧度为单位的相位偏移。相位偏移决定了信号在周期中的起始位置。例如,一个正弦信号,相位偏移为offset=0时起始于,即 0。相位偏移为 时,正弦信号offset=pi/2起始于,即 1。

offset is a phase offset in radians. Phase offset determines where in the period the signal starts. For example, a sine signal with offset=0 starts at , which is 0. With offset=pi/2 it starts at , which is 1.

信号有一个__add__方法,因此您可以使用+运算符来添加它们:

Signals have an __add__ method, so you can use the + operator to add them:

混合比例 = sin_sig + cos_sig
mix = sin_sig + cos_sig

结果是一个SumSignal,它表示两个或多个信号的总和。

The result is a SumSignal, which represents the sum of two or more signals.

ASignal本质上是数学函数的 Python 表示。大多数信号对t的所有值都有定义,从负无穷到无穷大。

A Signal is basically a Python representation of a mathematical function. Most signals are defined for all values of t, from negative infinity to infinity.

在对信号进行求值之前,你无法对其进行太多操作Signal。在此上下文中,“求值”指的是获取一系列时间点,ts并计算信号对应的值ys。我使用 NumPy 数组来表示tsys,并将它们封装在一个名为的对象中Wave

You can’t do much with a Signal until you evaluate it. In this context, “evaluate” means taking a sequence of points in time, ts, and computing the corresponding values of the signal, ys. I represent ts and ys using NumPy arrays and encapsulate them in an object called a Wave.

AWave表示在一系列时间点上评估的信号。每个时间点称为(该术语借用自电影和视频)。测量本身称为样本,尽管“帧”和“样本”有时可以互换使用。

A Wave represents a signal evaluated at a sequence of points in time. Each point in time is called a frame (a term borrowed from movies and video). The measurement itself is called a sample, although “frame” and “sample” are sometimes used interchangeably.

Signal提供make_wave,它返回一个新Wave对象:

Signal provides make_wave, which returns a new Wave object:

wave = mix.make_wave(duration=0.5, start=0, framerate=11025)
wave = mix.make_wave(duration=0.5, start=0, framerate=11025)

duration是帧的长度(Wave以秒为单位)。start是开始时间(也以秒为单位)。framerate是每秒帧数(整数),也就是每秒采样数。

duration is the length of the Wave in seconds. start is the start time, also in seconds. framerate is the (integer) number of frames per second, which is also the number of samples per second.

每秒 11,025 帧是音频文件格式(包括波形音频文件 (WAV) 和 MP3)中常用的几种帧速率之一。

11,025 frames per second is one of several frame rates commonly used in audio file formats, including Waveform Audio File (WAV) and MP3.

本例评估了t = 0 到t = 0.5 期间的信号,共 5513 个等间隔帧(因为 5513 是 11,025 的一半)。帧之间的时间间隔,或时间步长,为 1/1,1025 秒,约为 91微秒

This example evaluates the signal from t=0 to t=0.5 at 5513 equally spaced frames (because 5513 is half of 11,025). The time between frames, or timestep, is 1/1,1025 seconds, about 91 μs.

Wave提供了一种plot使用该方法的图形绘制方式pyplot。您可以像这样绘制波形:

Wave provides a plot method that uses pyplot. You can plot the wave like this:

wave.plot()
pyplot.show()
wave.plot()
pyplot.show()

pyplot是 matplotlib 的一部分;它包含在许多 Python 发行版中,或者您可能需要安装它。

pyplot is part of matplotlib; it is included in many Python distributions, or you might have to install it.

0.5 秒内freq=440有 220 个周期,因此该图看起来会像一块实心色块。要放大显示少量周期,我们可以使用 `xygs()` 函数segment,它会复制波形图的一部分Wave并返回一个新的波形:

At freq=440 there are 220 periods in 0.5 seconds, so this plot would look like a solid block of color. To zoom in on a small number of periods, we can use segment, which copies a segment of a Wave and returns a new wave:

周期 = mix.周期
segment = wave.segment(start=0, duration=period*3)
period = mix.period
segment = wave.segment(start=0, duration=period*3)

period是 a 的一个属性Signal;它返回以秒为单位的周期。

period is a property of a Signal; it returns the period in seconds.

start单位duration为秒。此示例复制前三个句点mix。结果是一个Wave对象。

start and duration are in seconds. This example copies the first three periods from mix. The result is a Wave object.

如果我们绘制出图像segment,它看起来像图 1-4。该信号包含两个频率分量,因此它比音叉产生的信号更复杂,但比小提琴产生的信号简单。

If we plot segment, it looks like Figure 1-4. This signal contains two frequency components, so it is more complicated than the signal from the tuning fork, but less complicated than the violin.

图 1-4.两个正弦信号混合后的片段。

阅读和写作浪潮

Reading and Writing Waves

thinkdsp提供read_wave读取 WAV 文件并返回一个结果的功能Wave

thinkdsp provides read_wave, which reads a WAV file and returns a Wave:

violin_wave = thinkdsp.read_wave('input.wav')
violin_wave = thinkdsp.read_wave('input.wav')

Wave提供了write一个可以写入 WAV 文件的功能:

And Wave provides write, which writes a WAV file:

wave.write(filename='output.wav')
wave.write(filename='output.wav')

您可以使用任何支持 WAV 文件的媒体播放器来收听Wave。在 Unix 系统上,我使用 npm aplay,它简单易用、稳定可靠,并且包含在许多 Linux 发行版中。

You can listen to the Wave with any media player that plays WAV files. On Unix systems I use aplay, which is simple, robust, and included in many Linux distributions.

thinkdsp它还提供了play_wave一个子进程来运行媒体播放器:

thinkdsp also provides play_wave, which runs the media player as a subprocess:

thinkdsp.play_wave(filename='output.wav', player='aplay')
thinkdsp.play_wave(filename='output.wav', player='aplay')

默认情况下会使用aplay其他玩家的名字,但您也可以提供其他玩家的名字。

It uses aplay by default, but you can provide the name of another player.

光谱

Spectrums

Wave提供make_spectrum,它返回一个Spectrum

Wave provides make_spectrum, which returns a Spectrum:

spectrum = wave.make_spectrum()
spectrum = wave.make_spectrum()

Spectrum提供plot

And Spectrum provides plot:

频谱图()
thinkplot.show()
spectrum.plot()
thinkplot.show()

thinkplot是我编写的一个模块,用于为 中的一些函数提供包装器pyplot。它包含在本书的 Git 存储库中(参见“使用代码”)。

thinkplot is a module I wrote to provide wrappers around some of the functions in pyplot. It is included in the Git repository for this book (see “Using the Code”).

Spectrum提供了三种改变光谱的方法:

Spectrum provides three methods that modify the spectrum:

  • low_pass应用低通滤波器,这意味着高于给定截止频率的分量会被衰减(即幅度降低)一定倍数。

  • low_pass applies a low-pass filter, which means that components above a given cutoff frequency are attenuated (that is, reduced in magnitude) by a factor.

  • high_pass应用高通滤波器,这意味着它会衰减截止频率以下的信号分量。

  • high_pass applies a high-pass filter, which means that it attenuates components below the cutoff.

  • band_stop衰减两个截止频率之间频带内的分量。

  • band_stop attenuates components in the band of frequencies between two cutoffs.

这个例子会将高于 600 的所有频率衰减 99%:

This example attenuates all frequencies above 600 by 99%:

spectrum.low_pass(截止频率=600,因子=0.01)
spectrum.low_pass(cutoff=600, factor=0.01)

低通滤波器会滤除明亮的高频声音,因此处理后的声音听起来会比较闷暗。要听听效果如何,您可以将音频文件转换Spectrum回标准音频格式Wave,然后播放:

A low-pass filter removes bright, high-frequency sounds, so the result sounds muffled and darker. To hear what it sounds like, you can convert the Spectrum back to a Wave, and then play it:

wave = spectrum.make_wave()
wave.play('temp.wav')
wave = spectrum.make_wave()
wave.play('temp.wav')

play方法会将音频文件写入文件,然后播放它。如果您使用 Jupyter Notebook,可以使用 ` make_audio<Audio_id>`,它会创建一个音频组件来播放声音。

The play method writes the wave to a file and then plays it. If you use Jupyter notebooks, you can use make_audio, which makes an Audio widget that plays the sound.

波动物体

Wave Objects

它本身并没有什么特别复杂的地方thinkdsp.py。它提供的大多数函数都是对 NumPy 和 SciPy 函数的简单封装。

There is nothing very complicated in thinkdsp.py. Most of the functions it provides are thin wrappers around functions from NumPy and SciPy.

中的主要类thinkdspSignalWaveSpectrum。给定一个Signal,你可以创建一个Wave。给定一个Wave,你可以创建一个Spectrum,反之亦然。这些关系如图 1-5所示。

The primary classes in thinkdsp are Signal, Wave, and Spectrum. Given a Signal, you can make a Wave. Given a Wave, you can make a Spectrum, and vice versa. These relationships are shown in Figure 1-5.

图 1-5.各类之间的关系thinkdsp

一个Wave对象包含三个属性:ys一个 NumPy 数组,用于存储信号中的值;ts一个数组,用于存储信号被评估或采样的时间点;以及framerate单位时间内的样本数。时间单位通常是秒,但并非必须如此。在我的一个示例中,单位是天。

A Wave object contains three attributes: ys is a NumPy array that contains the values in the signal; ts is an array of the times where the signal was evaluated or sampled; and framerate is the number of samples per unit of time. The unit of time is usually seconds, but it doesn’t have to be. In one of my examples, it’s days.

Wave它还提供了三个只读属性:start、、endduration。如果您修改ts,这些属性也会相应更改。

Wave also provides three read-only properties: start, end, and duration. If you modify ts, these properties change accordingly.

要修改波形,您可以直接访问 ` tsand` 属性ys。例如:

To modify a wave, you can access the ts and ys directly. For example:

wave.ys *= 2
wave.ts += 1
wave.ys *= 2
wave.ts += 1

第一行代码将波形放大 2 倍,使其音量增大。第二行代码调整波形的时间,使其延迟 1 秒开始。

The first line scales the wave by a factor of 2, making it louder. The second line shifts the wave in time, making it start 1 second later.

Wave它提供了执行许多常用操作的方法。例如,同样的两个转换可以写成:

But Wave provides methods that perform many common operations. For example, the same two transformations could be written:

wave.scale(2)
wave.shift(1)
wave.scale(2)
wave.shift(1)

您可以访问http://greenteapress.com/thinkdsp.html阅读这些方法及其他方法的文档。

You can read the documentation of these methods and others at http://greenteapress.com/thinkdsp.html.

信号对象

Signal Objects

Signal是一个父类,它提供了各种信号通用的功能,例如make_wave。子类继承了这些方法,并提供了evaluate,该函数在给定的时间序列上评估信号。

Signal is a parent class that provides functions common to all kinds of signals, like make_wave. Child classes inherit these methods and provide evaluate, which evaluates the signal at a given sequence of times.

例如,Sinusoid是 的子类Signal,其定义如下:

For example, Sinusoid is a child class of Signal, with this definition:

正弦波(信号)类:
    
    def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):
        Signal.__init__(self)
        self.freq = freq
        self.amp = amp
        self.offset = offset
        self.func = func
class Sinusoid(Signal):
    
    def __init__(self, freq=440, amp=1.0, offset=0, func=np.sin):
        Signal.__init__(self)
        self.freq = freq
        self.amp = amp
        self.offset = offset
        self.func = func

参数如下__init__

The parameters of __init__ are:

freq
freq

频率,单位为赫兹 (Hz)。

Frequency in cycles per second, or Hz.

amp
amp

振幅。振幅的单位是任意的,通常选择 1.0 来表示麦克风的最大输入或扬声器的最大输出。

Amplitude. The units of amplitude are arbitrary, usually chosen so 1.0 corresponds to the maximum input from a microphone or maximum output to a speaker.

offset
offset

表示信号在其周期中的起始位置;offset单位为弧度。

Indicates where in its period the signal starts; offset is in units of radians.

func
func

用于评估特定时间点信号的 Python 函数。它通常是 ` sine`np.sinnp.cos`cos`,分别产生正弦信号或余弦信号。

A Python function used to evaluate the signal at a particular point in time. It is usually either np.sin or np.cos, yielding a sine or cosine signal.

与许多初始化方法一样,这个方法只是将参数保存起来以供将来使用。

Like many init methods, this one just tucks the parameters away for future use.

Signal提供make_wave,如下所示:

Signal provides make_wave, which looks like this:

def make_wave(self, duration=1, start=0, framerate=11025):
    n = round(持续时间 * 帧率​​)
    ts = 起始时间 + np.arange(n) / 帧率
    ys = self.evaluate(ts)
    返回 Wave(ys, ts, framerate=framerate)
def make_wave(self, duration=1, start=0, framerate=11025):
    n = round(duration * framerate)
    ts = start + np.arange(n) / framerate
    ys = self.evaluate(ts)
    return Wave(ys, ts, framerate=framerate)

startduration分别是开始时间和持续时间(以秒为单位)。framerate是每秒帧数(样本数)。

start and duration are the start time and duration in seconds. framerate is the number of frames (samples) per second.

n其中,n 为样本数量,tts为样本时间的 NumPy 数组。

n is the number of samples, and ts is a NumPy array of sample times.

为了计算ysmake_wave调用了evaluate由以下函数提供的函数Sinusoid

To compute the ys, make_wave invokes evaluate, which is provided by Sinusoid:

def evaluate(self, ts):
    相位 = PI2 * 自我频率 * 时间 + 自我偏移
    ys = self.amp * self.func(phases)
    返回 ys
def evaluate(self, ts):
    phases = PI2 * self.freq * ts + self.offset
    ys = self.amp * self.func(phases)
    return ys

让我们一步一步地解开这个函数:

Let’s unwind this function one step at a time:

  1. self.freq是每秒的周期数,每个元素ts都是以秒为单位的时间,因此它们的乘积是自开始时间以来的周期数。

  2. self.freq is frequency in cycles per second, and each element of ts is a time in seconds, so their product is the number of cycles since the start time.

  3. PI2是一个存储值的常量。乘以 可以PI2将周期数转换为相位数。您可以将相位理解为以弧度表示的“自开始时间以来的周期数”。每个周期为弧度。

  4. PI2 is a constant that stores . Multiplying by PI2 converts from cycles to phase. You can think of phase as “cycles since the start time” expressed in radians. Each cycle is radians.

  5. self.offset是相位,当 时。它的作用是使信号在时间上向左或向右移动。

  6. self.offset is the phase when . It has the effect of shifting the signal left or right in time.

  7. 如果self.funcnp.sinnp.cos,则结果介于和之间

  8. If self.func is np.sin or np.cos, the result is a value between and .

  9. 乘以该数后,self.amp得到一个范围从-self.amp到的信号+self.amp

  10. Multiplying by self.amp yields a signal that ranges from -self.amp to +self.amp.

用数学符号evaluate表示为:

In math notation, evaluate is written like this:

其中A为振幅,f为频率,t为时间,ϕ₀为相位偏移。我似乎写了很多代码来计算一个简单的表达式,但正如我们将看到的,这段代码提供了一个处理各种信号的框架,不仅仅是正弦信号。

where A is amplitude, f is frequency, t is time, and ϕ0 is the phase offset. It may seem like I wrote a lot of code to evaluate one simple expression, but as we’ll see, this code provides a framework for dealing with all kinds of signals, not just sinusoids.

练习

Exercises

在开始这些练习之前,你应该按照“使用代码”中的说明下载本书的代码。

Before you begin these exercises, you should download the code for this book, following the instructions in “Using the Code”.

这些练习的答案在chap01soln.ipynb……

Solutions to these exercises are in chap01soln.ipynb.

练习 1-1。

如果您安装了 Jupyter,请加载chap01.ipynb、阅读并运行示例。您也可以通过http://tinyurl.com/thinkdsp01查看此笔记本。

If you have Jupyter, load chap01.ipynb, read through it, and run the examples. You can also view this notebook at http://tinyurl.com/thinkdsp01.

练习 1-2。

访问http://freesound.org下载一段包含音乐、语音或其他音调明确的音频样本。选择一段大约半秒的音调恒定的片段。计算并绘制所选片段的频谱。你能将声音的音色与频谱中的谐波结构联系起来吗?

Go to http://freesound.org and download a sound sample that includes music, speech, or other sounds that have a well-defined pitch. Select a roughly half-second segment where the pitch is constant. Compute and plot the spectrum of the segment you selected. What connection can you make between the timbre of the sound and the harmonic structure you see in the spectrum?

使用high_passlow_passband_stop来滤除一些谐波。然后将频谱转换回波形并聆听。声音的变化与你对频谱所做的更改有何关系?

Use high_pass, low_pass, and band_stop to filter out some of the harmonics. Then convert the spectrum back to a wave and listen to it. How does the sound relate to the changes you made in the spectrum?

练习 1-3。

SinSignal通过创建和对象并将它们相加来合成复合信号CosSignal。评估该信号以获得Wave,并聆听它。计算其Spectrum并绘制其波形。如果添加的频率分量不是基频的倍数,会发生什么?

Synthesize a compound signal by creating SinSignal and CosSignal objects and adding them up. Evaluate the signal to get a Wave, and listen to it. Compute its Spectrum and plot it. What happens if you add frequency components that are not multiples of the fundamental?

练习 1-4。

编写一个名为 `f` 的函数stretch,该函数接受一个参数 `a`和一个拉伸因子 `str`,并通过修改`a` 和 `str`Wave来加速或减慢波速。提示:该函数应该只需要两行代码。tsframerate

Write a function called stretch that takes a Wave and a stretch factor and speeds up or slows down the wave by modifying ts and framerate. Hint: it should only take two lines of code.

1. “spectrum”的复数形式通常写作“spectra”,但我更喜欢使用标准的英语复数形式。如果您熟悉“spectra”这种写法,我希望我的选择不会显得太奇怪。

1 The plural of “spectrum” is often written “spectra”, but I prefer to use standard English plurals. If you are familiar with “spectra”, I hope my choice doesn’t sound too strange.

第二章谐波

Chapter 2. Harmonics

在本章中,我将介绍几种新的波形;我们将观察它们的频谱,以了解它们的谐波结构,即它们由一组正弦波组成。

In this chapter I present several new waveforms; we will look at their spectrums to understand their harmonic structure, which is the set of sinusoids they are made up of.

我还会介绍数字信号处理中最重要的现象之一:混叠。并且我会更详细地解释一下这个Spectrum类是如何运作的。

I’ll also introduce one of the most important phenomena in digital signal processing: aliasing. And I’ll explain a little more about how the Spectrum class works.

本章的代码chap02.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp02查看它。

The code for this chapter is in chap02.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp02.

三角波

Triangle Waves

正弦波只包含一个频率分量,因此其频谱只有一个峰值。更复杂的波形,例如图 1-2中的小提琴录音,会产生具有多个峰值的离散傅里叶变换 (DFT)。本节我们将探讨波形与其频谱之间的关系。

A sinusoid contains only one frequency component, so its spectrum has only one peak. More complicated waveforms, like the violin recording in Figure 1-2, yield DFTs with many peaks. In this section we investigate the relationship between waveforms and their spectrums.

我先从三角波形开始,它就像是正弦波的直线版本。图 2-1显示了一个频率为 200 Hz 的三角波形。

I’ll start with a triangle waveform, which is like a straight-line version of a sinusoid. Figure 2-1 shows a triangle waveform with frequency 200 Hz.

图 2-1. 200 Hz 三角波信号的片段。

要生成三角波,您可以从以下代码开始thinkdsp.TriangleSignal

To generate a triangle wave, you can start with a thinkdsp.TriangleSignal:

class TriangleSignal(Sinusoid):
    
    def evaluate(self, ts):
        周期数 = self.freq * ts + self.offset / PI2
        frac, _ = np.modf(cycles)
        ys = np.abs(frac - 0.5)
        ys = normalize(unbias(ys), self.amp)
        返回 ys
class TriangleSignal(Sinusoid):
    
    def evaluate(self, ts):
        cycles = self.freq * ts + self.offset / PI2
        frac, _ = np.modf(cycles)
        ys = np.abs(frac - 0.5)
        ys = normalize(unbias(ys), self.amp)
        return ys

TriangleSignal它继承__init__Sinusoid,因此它接受相同的参数:freqampoffset

TriangleSignal inherits __init__ from Sinusoid, so it takes the same arguments: freq, amp, and offset.

唯一的区别在于evaluate……正如我们之前看到的,ts是我们想要评估信号的采样时间序列。

The only difference is evaluate. As we saw before, ts is the sequence of sample times where we want to evaluate the signal.

生成三角波的方法有很多种。细节并不重要,但基本evaluate原理如下:

There are many ways to generate a triangle wave. The details are not important, but here’s how evaluate works:

  1. cycles是自开始时间以来的循环次数。将循环np.modf次数拆分为小数部分(存储在 中)和整数部分(忽略)。1frac

  2. cycles is the number of cycles since the start time. np.modf splits the number of cycles into the fraction part, stored in frac, and the integer part, which is ignored.1

  3. frac这是一个以给定频率从 0 渐变到 1 的序列。减去 0.5 得到介于 -0.5 和 0.5 之间的值。取绝对值得到一个在 0.5 和 0 之间呈锯齿状的波形。

  4. frac is a sequence that ramps from 0 to 1 with the given frequency. Subtracting 0.5 yields values between –0.5 and 0.5. Taking the absolute value yields a waveform that zigzags between 0.5 and 0.

  5. unbias将波形向下移动,使其中心位于 0;然后normalize将其缩放到给定的振幅amp

  6. unbias shifts the waveform down so it is centered at 0; then normalize scales it to the given amplitude, amp.

以下是生成图 2-1 的代码:

Here’s the code that generates Figure 2-1:

signal = thinkdsp.TriangleSignal(200)
signal.plot()
signal = thinkdsp.TriangleSignal(200)
signal.plot()

接下来,我们可以用它Signal来制作一个Wave,并用它Wave来制作一个Spectrum

Next we can use the Signal to make a Wave, and use the Wave to make a Spectrum:

wave = signal.make_wave(duration=0.5, framerate=10000)
spectrum = wave.make_spectrum()
频谱图()
wave = signal.make_wave(duration=0.5, framerate=10000)
spectrum = wave.make_spectrum()
spectrum.plot()

图 2-2显示了结果的两个视图;右侧视图经过缩放,以便更清晰地显示谐波。正如预期的那样,最高峰值位于基频 200 Hz 处,并且在 200 的整数倍谐波频率处也存在其他峰值。

Figure 2-2 shows two views of the result; the view on the right is scaled to show the harmonics more clearly. As expected, the highest peak is at the fundamental frequency, 200 Hz, and there are additional peaks at harmonic frequencies, which are integer multiples of 200.

图 2-2. 200 Hz 三角波信号的频谱,以两个纵坐标刻度显示。右侧版本截去了基频,以便更清晰地显示谐波。

但令人惊讶的是,偶数倍频处没有峰值:400、800 等。三角波的谐波都是基频的奇数倍,在本例中为 600、1000、1400 等。

But one surprise is that there are no peaks at the even multiples: 400, 800, etc. The harmonics of a triangle wave are all odd multiples of the fundamental frequency, in this example 600, 1000, 1400, etc.

该频谱的另一个特征是谐波的振幅与频率之间的关系。谐波的振幅与频率的平方成正比下降。例如,前两个谐波(200 Hz 和 600 Hz)的频率比为 3,振幅比约为 9。接下来的两个谐波(600 Hz 和 1000 Hz)的频率比为 1.7,振幅比约为 1.7 。这种关系被称为谐波结构

Another feature of this spectrum is the relationship between the amplitude and frequency of the harmonics. Their amplitude drops off in proportion to frequency squared. For example, the frequency ratio of the first two harmonics (200 and 600 Hz) is 3, and the amplitude ratio is approximately 9. The frequency ratio of the next two harmonics (600 and 1000 Hz) is 1.7, and the amplitude ratio is approximately . This relationship is called the harmonic structure.

方波

Square Waves

thinkdsp它还提供了SquareSignal一个表示方波信号的函数。以下是该类的定义:

thinkdsp also provides SquareSignal, which represents a square signal. Here’s the class definition:

class SquareSignal(Sinusoid):
    
    def evaluate(self, ts):
        周期数 = self.freq * ts + self.offset / PI2
        frac, _ = np.modf(cycles)
        ys = self.amp * np.sign(unbias(frac))
        返回 ys
class SquareSignal(Sinusoid):
    
    def evaluate(self, ts):
        cycles = self.freq * ts + self.offset / PI2
        frac, _ = np.modf(cycles)
        ys = self.amp * np.sign(unbias(frac))
        return ys

就像TriangleSignalSquareSignal继承__init__Sinusoid,所以它接受相同的参数。

Like TriangleSignal, SquareSignal inherits __init__ from Sinusoid, so it takes the same parameters.

方法evaluate类似。同样,cycles是自开始时间以来的周期数,frac是小数部分,每个周期从 0 递增到 1。

And the evaluate method is similar. Again, cycles is the number of cycles since the start time, and frac is the fractional part, which ramps from 0 to 1 each period.

unbias进行偏移frac,使其从 -0.5 逐渐过渡到 0.5,然后np.sign将负值映射到 -1,正值映射到 1。乘以得到一个在和amp之间跳跃的方波。-ampamp

unbias shifts frac so it ramps from –0.5 to 0.5, then np.sign maps the negative values to –1 and the positive values to 1. Multiplying by amp yields a square wave that jumps between -amp and amp.

图 2-3显示了频率为 100 Hz 的方波的三个周期,图 2-4显示了它的频谱。

Figure 2-3 shows three periods of a square wave with frequency 100 Hz, and Figure 2-4 shows its spectrum.

与三角波类似,方波也只包含奇次谐波,因此在 300、500 和 700 Hz 等频率处会出现峰值。但方波谐波的振幅衰减速度较慢。具体来说,振幅的衰减与频率成正比(而非与频率的平方成正比)。

Like a triangle wave, the square wave contains only odd harmonics, which is why there are peaks at 300, 500, and 700 Hz, etc. But the amplitude of the harmonics drops off more slowly. Specifically, amplitude drops in proportion to frequency (not frequency squared).

本章末尾的练习将使您有机会探索其他波形和其他谐波结构。

The exercises at the end of this chapter give you a chance to explore other waveforms and other harmonic structures.

图 2-3. 100 Hz 方波信号的片段。
图 2-4. 100 Hz 方波信号的频谱。

别名

Aliasing

我得坦白一件事。上一节的例子都是我精心挑选的,就是为了避免让你们感到困惑。但现在,是时候让你们感到困惑了。

I have a confession. I chose the examples in the previous section carefully to avoid showing you something confusing. But now it’s time to get confused.

图 2-5显示了频率为 1100 Hz、采样率为每秒 10,000 帧的三角波频谱。同样,右侧视图已按比例缩放以显示谐波。

Figure 2-5 shows the spectrum of a triangle wave at 1100 Hz, sampled at 10,000 frames per second. Again, the view on the right is scaled to show the harmonics.

图 2-5.采样频率为 1100 Hz、采样率为 10,000 帧/秒的三角波信号的频谱。右侧视图按比例缩放以显示谐波。

该波的谐波频率应为 3300、5500、7700 和 9900 Hz。图中,1100 和 3300 Hz 处有峰值,符合预期,但第三个峰值在 4500 Hz,而不是 5500 Hz。第四个峰值在 2300 Hz,而不是 7700 Hz。仔细观察,本应在 9900 Hz 的峰值实际上在 100 Hz。这是怎么回事?

The harmonics of this wave should be at 3300, 5500, 7700, and 9900 Hz. In the figure, there are peaks at 1100 and 3300 Hz, as expected, but the third peak is at 4500, not 5500 Hz. The fourth peak is at 2300, not 7700 Hz. And if you look closely, the peak that should be at 9900 is actually at 100 Hz. What’s going on?

问题在于,当你在离散的时间点评估信号时,你会丢失样本间发生的信息。对于低频分量来说,这不是问题,因为每个周期内有很多样本。

The problem is that when you evaluate the signal at discrete points in time, you lose information about what happened between samples. For low-frequency components, that’s not a problem, because you have lots of samples per period.

但如果你以 5000 Hz 的频率和每秒 10000 帧的速度对信号进行采样,那么每个周期你只有两个样本。这勉强够用,但如果频率更高,那就不够了。

But if you sample a signal at 5000 Hz with 10,000 frames per second, you only have two samples per period. That turns out to be enough, just barely, but if the frequency is higher, it’s not.

为了了解原因,我们生成频率分别为 4500 Hz 和 5500 Hz 的余弦信号,并以每秒 10,000 帧的速度对它们进行采样:

To see why, let’s generate cosine signals at 4500 and 5500 Hz, and sample them at 10,000 frames per second:

帧率 = 10000

signal = thinkdsp.CosSignal(4500)
持续时间 = 信号周期 * 5
segment = signal.make_wave(duration, framerate=framerate)
segment.plot()

signal = thinkdsp.CosSignal(5500)
segment = signal.make_wave(duration, framerate=framerate)
segment.plot()
framerate = 10000

signal = thinkdsp.CosSignal(4500)
duration = signal.period*5
segment = signal.make_wave(duration, framerate=framerate)
segment.plot()

signal = thinkdsp.CosSignal(5500)
segment = signal.make_wave(duration, framerate=framerate)
segment.plot()

图 2-6显示了结果。我Signal用细灰线绘制了 s 值,用垂直线绘制了样本值,以便于比较两个Waves 值。问题应该很明显:尽管Signals 值不同,但Waves 值却相同!

Figure 2-6 shows the result. I plotted the Signals with thin gray lines and the samples using vertical lines, to make it easier to compare the two Waves. The problem should be clear: even though the Signals are different, the Waves are identical!

图 2-6.采样频率分别为 4500 Hz 和 5500 Hz 的余弦信号,采样率为每秒 10,000 帧。信号本身不同,但采样点相同。

当我们以每秒 10,000 帧的速率对 5500 Hz 的信号进行采样时,结果与 4500 Hz 的信号无法区分。同样,7700 Hz 的信号与 2300 Hz 的信号无法区分,9900 Hz 的信号与 100 Hz 的信号无法区分。

When we sample a 5500 Hz signal at 10,000 frames per second, the result is indistinguishable from a 4500 Hz signal. For the same reason, a 7700 Hz signal is indistinguishable from 2300 Hz, and a 9900 Hz signal is indistinguishable from 100 Hz.

这种效应称为混叠,因为当对高频信号进行采样时,它看起来像是一个低频信号。

This effect is called aliasing because when the high-frequency signal is sampled, it appears to be a low-frequency signal.

在这个例子中,我们能测量的最高频率是 5000 Hz,也就是采样率的一半。高于 5000 Hz 的频率会被折叠回 5000 Hz 以下,这就是为什么这个阈值有时被称为“折叠频率”。它有时也被称为奈奎斯特频率。参见http://en.wikipedia.org/wiki/Nyquist_frequency

In this example, the highest frequency we can measure is 5000 Hz, which is half the sampling rate. Frequencies above 5000 Hz are folded back below 5000 Hz, which is why this threshold is sometimes called the “folding frequency”. It is sometimes also called the Nyquist frequency. See http://en.wikipedia.org/wiki/Nyquist_frequency.

如果混叠频率低于零,折叠模式会继续。例如,1100 Hz 三角波的五次谐波频率为 12100 Hz。它在 5000 Hz 处折叠后,会出现在 -2100 Hz 处,但它会在 0 Hz 处再次折叠,回到 2100 Hz。实际上,您可以在图 2-4中看到 2100 Hz 处的小峰,以及 4300 Hz 处的下一个峰。

The folding pattern continues if the aliased frequency goes below zero. For example, the fifth harmonic of the 1100 Hz triangle wave is at 12,100 Hz. Folded at 5000 Hz, it would appear at –2100 Hz, but it gets folded again at 0 Hz, back to 2100 Hz. In fact, you can see a small peak at 2100 Hz in Figure 2-4, and the next one at 4300 Hz.

计算频谱

Computing the Spectrum

我们已经多次见过这种Wave方法make_spectrum。以下是它的实现方式(省略了一些细节,稍后会详细介绍):

We have seen the Wave method make_spectrum several times. Here is the implementation (leaving out some details we’ll get to later):

from np.fft import rfft, rfftfreq

# Wave 类:
    def make_spectrum(self):
        n = len(self.ys)
        d = 1 / 自身帧率

        hs = rfft(self.ys)
        fs = rfftfreq(n, d)

        返回 Spectrum(hs, fs, self.framerate)
from np.fft import rfft, rfftfreq

# class Wave:
    def make_spectrum(self):
        n = len(self.ys)
        d = 1 / self.framerate

        hs = rfft(self.ys)
        fs = rfftfreq(n, d)

        return Spectrum(hs, fs, self.framerate)

该参数self是一个Wave对象。n是波形中的样本数,d是帧速率的倒数,即样本之间的时间间隔。

The parameter self is a Wave object. n is the number of samples in the wave, and d is the inverse of the frame rate, which is the time between samples.

np.fft是 NumPy 模块,提供与快速傅里叶变换 (FFT) 相关的函数,FFT 是一种计算离散傅里叶变换 (DFT) 的高效算法。

np.fft is the NumPy module that provides functions related to the Fast Fourier Transform (FFT), which is an efficient algorithm that computes the Discrete Fourier Transform (DFT).

make_spectrum这里使用rfft`real FFT`,因为它Wave包含的是实数,而不是复数。稍后我们将看到完整的 FFT,它可以处理复信号(参见“实信号的 DFT”)。`real FFT` 的结果rfft,我称之为hs`ff_a`,是一个 NumPy 复数数组,表示波中每个频率分量的振幅和相位偏移。

make_spectrum uses rfft, which stands for “real FFT”, because the Wave contains real values, not complex. Later we’ll see the full FFT, which can handle complex signals (see “DFT of Real Signals”). The result of rfft, which I call hs, is a NumPy array of complex numbers that represents the amplitude and phase offset of each frequency component in the wave.

结果rfftfreq(我称之为fs)是一个数组,其中包含与对应的频率hs

The result of rfftfreq, which I call fs, is an array that contains frequencies corresponding to the hs.

为了理解复数中的值hs,可以考虑以下两种思考复数的方式:

To understand the values in hs, consider these two ways to think about complex numbers:

  • 复数是实部和虚部之和,通常记为i = (i - i) ,其中i是虚数单位。你可以把xy看作是笛卡尔坐标。

  • A complex number is the sum of a real part and an imaginary part, often written , where i is the imaginary unit . You can think of x and y as Cartesian coordinates.

  • 复数也可以表示为模和复指数的乘积,即 A = ϕ ,其中Aϕ是以弧度为单位的角度,也称为“辐角”。你可以把Aϕ看作是极坐标。

  • A complex number is also the product of a magnitude and a complex exponential, , where A is the magnitude and ϕ is the angle in radians, also called the “argument”. You can think of A and ϕ as polar coordinates.

每个值hs对应一个频率分量:其幅度与对应分量的振幅成正比;其角度为相位偏移。

Each value in hs corresponds to a frequency component: its magnitude is proportional to the amplitude of the corresponding component; its angle is the phase offset.

该类Spectrum提供两个只读属性 `m`amps和 ` anglesθ`,它们返回表示物体的幅值和角度的 NumPy 数组hs。绘制Spectrum物体时,我们通常绘制 `m`与amps`θ` 的关系图。有时绘制 `θ`与 `θ` 的关系fs图也很有用。anglesfs

The Spectrum class provides two read-only properties, amps and angles, which return NumPy arrays representing the magnitudes and angles of the hs. When we plot a Spectrum object, we usually plot amps versus fs. Sometimes it is also useful to plot angles versus fs.

尽管查看实部和虚部可能很诱人hs,但你几乎永远不需要这样做。我建议你将DFT视为一个振幅和相位偏移的向量,它们恰好以复数的形式编码。

Although it might be tempting to look at the real and imaginary parts of hs, you will almost never need to. I encourage you to think of the DFT as a vector of amplitudes and phase offsets that happen to be encoded in the form of complex numbers.

要修改 a Spectrum,您可以hs直接访问它。例如:

To modify a Spectrum, you can access the hs directly. For example:

spectrum.hs *= 2
spectrum.hs[spectrum.fs > cutoff] = 0
spectrum.hs *= 2
spectrum.hs[spectrum.fs > cutoff] = 0

第一行将矩阵的元素乘以 2,这使得所有分量的振幅都加倍。第二行仅将频率超过某个截止频率hs的元素设为 0 。hs

The first line multiplies the elements of hs by 2, which doubles the amplitudes of all components. The second line sets to 0 only the elements of hs where the corresponding frequency exceeds some cutoff frequency.

Spectrum也提供了执行这些操作的方法:

But Spectrum also provides methods to perform these operations:

spectrum.scale(2)
spectrum.low_pass(截止频率)
spectrum.scale(2)
spectrum.low_pass(cutoff)

您可以访问http://greenteapress.com/thinkdsp.html阅读这些方法及其他方法的文档。

You can read the documentation of these methods and others at http://greenteapress.com/thinkdsp.html.

到目前为止,你应该对Signal` std::string` Wave、`std::string` 和 ` Spectrumstd::string` 类的工作原理有了更清晰的了解,但我还没有解释快速傅里叶变换的工作原理。这还需要几章的时间来讲解。

At this point you should have a better idea of how the Signal, Wave, and Spectrum classes work, but I have not explained how the Fast Fourier Transform works. That will take a few more chapters.

练习

Exercises

这些练习的答案在chap02soln.ipynb……

Solutions to these exercises are in chap02soln.ipynb.

练习 2-1。

如果您使用 Jupyter,请加载并尝试这些示例。您也可以通过http://tinyurl.com/thinkdsp02chap02.ipynb查看 notebook 。

If you use Jupyter, load chap02.ipynb and try out the examples. You can also view the notebook at http://tinyurl.com/thinkdsp02.

练习 2-2。

锯齿波信号的波形是从-1线性上升到1,然后下降到-1并重复。参见http://en.wikipedia.org/wiki/Sawtooth_wave

A sawtooth signal has a waveform that ramps up linearly from –1 to 1, then drops to –1 and repeats. See http://en.wikipedia.org/wiki/Sawtooth_wave.

编写一个名为 `class` 的类SawtoothSignal,该类继承自`class`Signal并提供evaluate评估锯齿信号的功能。

Write a class called SawtoothSignal that extends Signal and provides evaluate to evaluate a sawtooth signal.

计算锯齿波的频谱。其谐波结构与三角波和方波相比有何异同?

Compute the spectrum of a sawtooth wave. How does the harmonic structure compare to triangle and square waves?

练习 2-3。

生成一个频率为 1100 Hz 的方波信号,并Wave以每秒 10,000 帧的采样率对其进行采样。绘制频谱图后,可以看到大部分谐波都发生了混叠。聆听波形时,你能听到这些混叠谐波吗?

Make a square signal at 1100 Hz and make a Wave that samples it at 10,000 frames per second. If you plot the spectrum, you can see that most of the harmonics are aliased. When you listen to the wave, can you hear the aliased harmonics?

练习 2-4。

如果你有一个频谱对象,spectrum并打印出它的前几个值spectrum.fs,你会发现它们都从零开始。因此,spectrum.hs[0]频率为 0 的分量的幅度为零。但这又意味着什么呢?

If you have a spectrum object, spectrum, and print the first few values of spectrum.fs, you’ll see that they start at zero. So spectrum.hs[0] is the magnitude of the component with frequency 0. But what does that mean?

试试这个实验:

Try this experiment:

  1. 生成一个频率为 440、Wave持续时间为 0.01 秒的三角波信号,并绘制其波形。

  2. Make a triangle signal with frequency 440 and make a Wave with duration 0.01 seconds. Plot the waveform.

  3. 创建一个Spectrum对象并打印出来spectrum.hs[0]。该分量的振幅和相位是多少?

  4. Make a Spectrum object and print spectrum.hs[0]. What is the amplitude and phase of this component?

  5. 设置spectrum.hs[0] = 100。此操作对波形有何影响?提示:Spectrum提供了一个名为 的方法make_wave,该方法计算Wave与 对应的Spectrum

  6. Set spectrum.hs[0] = 100. What effect does this operation have on the waveform? Hint: Spectrum provides a method called make_wave that computes the Wave that corresponds to the Spectrum.

练习 2-5。

编写一个函数,该函数接受Spectrum一个参数 a,并通过将 a 的每个元素hs除以 a 中对应的频率来修改 a fs。提示:由于除以零是未定义的,你可能需要设置 a = 0 spectrum.hs[0] = 0

Write a function that takes a Spectrum as a parameter and modifies it by dividing each element of hs by the corresponding frequency from fs. Hint: since division by zero is undefined, you might want to set spectrum.hs[0] = 0.

使用方波、三角波或锯齿波测试您的函数:

Test your function using a square, triangle, or sawtooth wave:

  1. 计算Spectrum并绘制其图像。

  2. Compute the Spectrum and plot it.

  3. 修改Spectrum函数并重新绘制图像。

  4. Modify the Spectrum using your function and plot it again.

  5. 利用修改后的信号Spectrum.make_wave生成一个信号,并监听它。这个操作对信号有什么影响?WaveSpectrum

  6. Use Spectrum.make_wave to make a Wave from the modified Spectrum, and listen to it. What effect does this operation have on the signal?

练习 2-6。

三角波和方波只有奇次谐波;锯齿波既有偶次谐波也有奇次谐波。方波和锯齿波的谐波衰减与 成正比;三角波的谐波衰减与 成正比。你能找到一种波形,其偶次谐波和奇次谐波的衰减与 成正比吗?

Triangle and square waves have odd harmonics only; the sawtooth wave has both even and odd harmonics. The harmonics of the square and sawtooth waves drop off in proportion to ; the harmonics of the triangle wave drop off like . Can you find a waveform that has even and odd harmonics that drop off like ?

提示:有两种方法可以解决这个问题。你可以通过叠加正弦波来构建所需的信号,或者你可以从一个与所需信号相似的信号开始,然后对其进行修改。

Hint: there are two ways you could approach this. You could construct the signal you want by adding up sinusoids, or you could start with a signal that is similar to what you want and modify it.

1.使用下划线作为变量名是一种约定俗成的表示方法,意思是“我不打算使用这个值”。

1 Using an underscore as a variable name is a convention that means, “I don’t intend to use this value.”

第三章非周期信号

Chapter 3. Non-Periodic Signals

我们目前处理的信号都是周期性的,这意味着它们会无限重复。这也意味着它们所包含的频率分量不会随时间变化。本章我们将讨论非周期性信号,它们的频率分量随时间变化。换句话说,几乎所有的声音信号都属于非周期性信号。

The signals we have worked with so far are periodic, which means that they repeat forever. It also means that the frequency components they contain do not change over time. In this chapter, we consider non-periodic signals, whose frequency components do change over time. In other words, pretty much all sound signals.

本章还介绍了频谱图,这是一种可视化非周期信号的常用方法。

This chapter also presents spectrograms, a common way to visualize non-periodic signals.

本章的代码chap03.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp03查看它。

The code for this chapter is in chap03.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp03.

线性啁啾

Linear Chirp

我们将从啁啾信号开始,这是一种频率可变的信号。thinkdsp它提供了一个称为正弦波的Signal信号Chirp,该信号可以线性地扫描一系列频率。

We’ll start with a chirp, which is a signal with variable frequency. thinkdsp provides a Signal called Chirp that makes a sinusoid that sweeps linearly through a range of frequencies.

这里有一个例子,它的频率范围从 220 赫兹扫到 880 赫兹,也就是从 A3 到 A5 的两个八度:

Here’s an example that sweeps from 220 to 880 Hz, which is two octaves from A3 to A5:

signal = thinkdsp.Chirp(start=220, end=880)
wave = signal.make_wave()
signal = thinkdsp.Chirp(start=220, end=880)
wave = signal.make_wave()

图 3-1显示了该波在起始、中间和结束附近的部分波形。显然,频率正在增加。

Figure 3-1 shows segments of this wave near the beginning, middle, and end. It’s clear that the frequency is increasing.

图 3-1.啁啾波形在开始、中间和结束附近。

在继续之前,我们先来看看它Chirp是如何实现的。以下是类定义:

Before we go on, let’s see how Chirp is implemented. Here is the class definition:

class Chirp(Signal):
    
    def __init__(self, start=440, end=880, amp=1.0):
        self.start = start
        self.end = end
        self.amp = amp
class Chirp(Signal):
    
    def __init__(self, start=440, end=880, amp=1.0):
        self.start = start
        self.end = end
        self.amp = amp

startend分别是啁啾声开始和结束时的频率(单位:赫兹)。amp是振幅。

start and end are the frequencies, in Hz, at the start and end of the chirp. amp is amplitude.

以下是评估信号的函数:

Here is the function that evaluates the signal:

def evaluate(self, ts):
    freqs = np.linspace(self.start, self.end, len(ts)-1)
    返回 self._evaluate(ts, freqs)
def evaluate(self, ts):
    freqs = np.linspace(self.start, self.end, len(ts)-1)
    return self._evaluate(ts, freqs)

ts是信号需要评估的时间点序列;为了简化这个函数,我假设它们是等间隔的。

ts is the sequence of points in time where the signal should be evaluated; to keep this function simple, I assume they are equally spaced.

如果 的长度为tsn 你可以把它看作是一系列时间间隔。为了计算每个时间间隔内的频率,我使用,它返回一个介于和 之间的np.linspaceNumPy 值数组。startend

If the length of ts is n, you can think of it as a sequence of intervals of time. To compute the frequency during each interval, I use np.linspace, which returns a NumPy array of values between start and end.

_evaluate是一个私有方法,用于完成其余的数学运算1 :

_evaluate is a private method that does the rest of the math1:

def _evaluate(self, ts, freqs):
    dts = np.diff(ts)
    dphis = PI2 * freqs * dts
    phases = np.cumsum(dphis)
    phases = np.insert(phases, 0, 0)
    ys = self.amp * np.cos(phases)
    返回 ys
def _evaluate(self, ts, freqs):
    dts = np.diff(ts)
    dphis = PI2 * freqs * dts
    phases = np.cumsum(dphis)
    phases = np.insert(phases, 0, 0)
    ys = self.amp * np.cos(phases)
    return ys

np.diff计算集合中相邻元素之间的差值ts,并返回每个间隔的长度(以秒为单位)。如果集合中的元素ts等距分布,dts则所有值都相同。

np.diff computes the difference between adjacent elements of ts, returning the length of each interval in seconds. If the elements of ts are equally spaced, the dts values are all the same.

下一步是确定每个时间间隔内相位的变化量。在“信号对象”一节中,我们看到,当频率恒定时,相位ϕ随时间线性增加:

The next step is to figure out how much the phase changes during each interval. In “Signal Objects” we saw that when frequency is constant, the phase, ϕ, increases linearly over time:

当频率是时间的函数时,在短时间间隔Δt内的相位变化为:

When frequency is a function of time, the change in phase during a short time interval, Δt, is:

在 Python 中,由于freqs`x`包含dts时间间隔,我们可以这样写:

In Python, since freqs contains and dts contains the time intervals, we can write:

dphis = PI2 * freqs * dts
dphis = PI2 * freqs * dts

现在,由于dphis包含了相位变化,我们可以通过将每个时间步长的变化相加来得到总相位:

Now, since dphis contains the changes in phase, we can get the total phase at each timestep by adding up the changes:

phases = np.cumsum(dphis)
phases = np.insert(phases, 0, 0)
phases = np.cumsum(dphis)
phases = np.insert(phases, 0, 0)

np.cumsum计算累积和,这几乎是我们所想要的,但它不是从 0 开始的。所以我会np.insert在开头加上一个 0。

np.cumsum computes the cumulative sum, which is almost what we want, but it doesn’t start at 0. So I use np.insert to add a 0 at the beginning.

结果是一个 NumPy 数组,其中i第 n 个元素包含第 n 个区间末尾的前i几项之和dphis;也就是说,是第 n 个区间末尾的总相位i。最后,np.cos计算波的振幅作为相位的函数(记住,相位以弧度表示)。

The result is a NumPy array where the ith element contains the sum of the first i terms from dphis; that is, the total phase at the end of the ith interval. Finally, np.cos computes the amplitude of the wave as a function of phase (remember that phase is expressed in radians).

如果你懂微积分,你可能会注意到当Δt趋于无穷大时,极限为:

If you know calculus, you might notice that the limit as Δt gets small is:

两边同时除以dt得到:

Dividing through by dt yields:

换句话说,频率是相位的导数。反之,相位是频率的积分。过去我们试图cumsum从频率推导出相位时,实际上是在近似积分。

In other words, frequency is the derivative of phase. Conversely, phase is the integral of frequency. When we used cumsum to go from frequency to phase, we were approximating integration.

指数啁啾

Exponential Chirp

当你聆听这段啁啾声时,你可能会注意到它的音调起初上升得很快,然后逐渐减慢。这段啁啾声跨越了两个八度,但跨越第一个八度只需要2/3秒,而跨越第二个八度则需要两倍的时间。

When you listen to this chirp, you might notice that the pitch rises quickly at first and then slows down. The chirp spans two octaves, but it only takes 2/3 s to span the first octave, and twice as long to span the second.

原因在于我们对音高的感知取决于频率的对数。因此,我们听到的两个音符之间的音程取决于它们频率的比值,而不是频率差。“音程”是音乐术语,指的是两个音高之间感知到的差异。

The reason is that our perception of pitch depends on the logarithm of frequency. As a result, the interval we hear between two notes depends on the ratio of their frequencies, not the difference. “Interval” is the musical term for the perceived difference between two pitches.

例如,八度音程是指两个音高之比为2的音程。因此,220赫兹到440赫兹的音程是一个八度音程,440赫兹到880赫兹的音程也是一个八度音程。频率差异更大,但比例相同。

For example, an octave is an interval where the ratio of two pitches is 2. So the interval from 220 to 440 Hz is one octave and the interval from 440 to 880 Hz is also one octave. The difference in frequency is bigger, but the ratio is the same.

因此,如果频率呈线性增加,就像线性啁啾声一样,则感知到的音调呈对数增加。

As a result, if frequency increases linearly, as in a linear chirp, the perceived pitch increases logarithmically.

如果希望感知到的音调线性增加,频率就必须呈指数增长。具有这种形状的信号称为指数啁啾信号

If you want the perceived pitch to increase linearly, the frequency has to increase exponentially. A signal with that shape is called an exponential chirp.

以下是ExpoChirp类定义:

Here’s the ExpoChirp class definition:

class ExpoChirp(Chirp):
    
    def evaluate(self, ts):
        start, end = np.log10(self.start), np.log10(self.end)
        freqs = np.logspace(start, end, len(ts)-1)
        返回 self._evaluate(ts, freqs)
class ExpoChirp(Chirp):
    
    def evaluate(self, ts):
        start, end = np.log10(self.start), np.log10(self.end)
        freqs = np.logspace(start, end, len(ts)-1)
        return self._evaluate(ts, freqs)

np.linspace此版本evaluate使用,而不是np.logspace,它创建了一系列对数等距分布的频率,这意味着它们呈指数增长。

Instead of np.linspace, this version of evaluate uses np.logspace, which creates a series of frequencies whose logarithms are equally spaced, which means that they increase exponentially.

就是这样;其他一切都和之前一样Chirp。以下是生成该模型的代码:

That’s it; everything else is the same as Chirp. Here’s the code that makes one:

signal = thinkdsp.ExpoChirp(start=220, end=880)
wave = signal.make_wave(duration=1)
signal = thinkdsp.ExpoChirp(start=220, end=880)
wave = signal.make_wave(duration=1)

您可以收听示例chap03.ipynb并比较线性啁啾和指数啁啾。

You can listen to the examples in chap03.ipynb and compare the linear and exponential chirps.

啁啾声的频谱

Spectrum of a Chirp

你觉得计算一个啁啾信号的频谱会发生什么?这里有一个例子,它构建了一个持续一秒、一个八度的啁啾信号及其频谱:

What do you think happens if you compute the spectrum of a chirp? Here’s an example that constructs a one-second, one-octave chirp and its spectrum:

signal = thinkdsp.Chirp(start=220, end=440)
wave = signal.make_wave(duration=1)
spectrum = wave.make_spectrum()
signal = thinkdsp.Chirp(start=220, end=440)
wave = signal.make_wave(duration=1)
spectrum = wave.make_spectrum()

图 3-2显示了结果。频谱在 220 至 440 Hz 的每个频率上都有成分,其变化看起来有点像索伦之眼(参见http://en.wikipedia.org/wiki/Sauron)。

Figure 3-2 shows the result. The spectrum has components at every frequency from 220 to 440 Hz, with variations that look a little like the Eye of Sauron (see http://en.wikipedia.org/wiki/Sauron).

图 3-2.一秒、一个八度音程的啁啾信号的频谱。

在 220 至 440 Hz 之间,频谱大致平坦,这表明信号在该范围内每个频率停留的时间相等。基于此,你应该能够推测出指数啁啾信号的频谱是什么样的。

The spectrum is approximately flat between 220 and 440 Hz, which indicates that the signal spends equal time at each frequency in this range. Based on that observation, you should be able to guess what the spectrum of an exponential chirp looks like.

频谱图可以提供信号结构的一些线索,但它掩盖了频率和时间之间的关系。例如,我们无法通过观察这个频谱图来判断频率是升高了还是降低了,或者两者兼有。

The spectrum gives hints about the structure of the signal, but it obscures the relationship between frequency and time. For example, we cannot tell by looking at this spectrum whether the frequency went up or down, or both.

频谱图

Spectrogram

为了恢复频率和时间之间的关系,我们可以将啁啾信号分成若干段,并绘制每一段的频谱。这种结果称为短时傅里叶变换(STFT)。

To recover the relationship between frequency and time, we can break the chirp into segments and plot the spectrum of each segment. The result is called a Short-Time Fourier Transform (STFT).

短时傅里叶变换 (STFT) 的可视化方法有很多种,但最常见的是频谱图,它以时间轴为横坐标,频率轴为纵坐标。频谱图中的每一列都显示了一小段信号的频谱,并用颜色或灰度来表示振幅。

There are several ways to visualize an STFT, but the most common is a spectrogram, which shows time on the x-axis and frequency on the y-axis. Each column in the spectrogram shows the spectrum of a short segment, using color or grayscale to represent amplitude.

例如,我将计算这个啁啾信号的频谱图:

As an example, I’ll compute the spectrogram of this chirp:

signal = thinkdsp.Chirp(start=220, end=440)
wave = signal.make_wave(duration=1, framerate=11025)
signal = thinkdsp.Chirp(start=220, end=440)
wave = signal.make_wave(duration=1, framerate=11025)

Wave提供make_spectrogram,它返回一个Spectrogram对象:

Wave provides make_spectrogram, which returns a Spectrogram object:

频谱图 = wave.make_spectrogram(seg_length=512)
光谱图.plot(high=700)
spectrogram = wave.make_spectrogram(seg_length=512)
spectrogram.plot(high=700)

seg_length是每个片段中的样本数。我选择 512 是因为当样本数为 2 的幂时,FFT 的效率最高。

seg_length is the number of samples in each segment. I chose 512 because the FFT is most efficient when the number of samples is a power of 2.

图 3-3显示了结果。x 轴表示时间,范围从 0 到 1 秒。y 轴表示频率,范围从 0 到 700 Hz。我截掉了频谱图的顶部部分;完整的范围达到 5512.5 Hz,这是帧速率的一半。

Figure 3-3 shows the result. The x-axis shows time from 0 to 1 seconds. The y-axis shows frequency from 0 to 700 Hz. I cut off the top part of the spectrogram; the full range goes to 5512.5 Hz, which is half of the frame rate.

图 3-3.一秒、一个八度音程的啁啾声的频谱图。

频谱图清晰地显示频率随时间线性增加。然而,请注意,每列中的峰值在2-3个单元格内呈现模糊现象。这种模糊反映了频谱图分辨率的局限性。

The spectrogram shows clearly that frequency increases linearly over time. However, notice that the peak in each column is blurred across 2–3 cells. This blurring reflects the limited resolution of the spectrogram.

加博尔极限

The Gabor Limit

频谱图的时间分辨率就是各个片段的持续时间,这对应于频谱图中单元格的宽度。由于每个片段包含 512 帧,而每秒有 11,025 帧,因此每个片段的持续时间约为 0.046 秒。

The time resolution of the spectrogram is the duration of the segments, which corresponds to the width of the cells in the spectrogram. Since each segment is 512 frames, and there are 11,025 frames per second, the duration of each segment is about 0.046 seconds.

频率分辨率是指频谱中各元素之间的频率范围,它对应于单元格的高度。512 帧数据可以得到 256 个频率分量,范围从 0 到 5512.5 Hz,因此分量之间的频率范围为 21.6 Hz。

The frequency resolution is the frequency range between elements in the spectrum, which corresponds to the height of the cells. With 512 frames, we get 256 frequency components over a range from 0 to 5512.5 Hz, so the range between components is 21.6 Hz.

更一般地,如果n是片段长度,则频谱包含个分量。如果帧速率为r,则频谱中的最大频率为。因此,时间分辨率为,频率分辨率为 。

More generally, if n is the segment length, the spectrum contains components. If the frame rate is r, the maximum frequency in the spectrum is . So the time resolution is and the frequency resolution is:

which is .

理想情况下,我们希望时间分辨率小,以便观察频率的快速变化;我们也希望频率分辨率小,以便观察频率的微小变化。但两者不可兼得。请注意,时间分辨率是频率分辨率 的倒数。因此,如果其中一个变小,另一个就会变大。

Ideally we would like time resolution to be small, so we can see rapid changes in frequency. And we would like frequency resolution to be small so we can see small changes in frequency. But you can’t have both. Notice that time resolution, , is the inverse of frequency resolution, . So if one gets smaller, the other gets bigger.

例如,如果将片段长度加倍,频率分辨率会减半(这是好事),但时间分辨率会加倍(这是坏事)。即使提高帧速率也无济于事。虽然采样点数量增加了,但频率范围也随之增大。

For example, if you double the segment length, you cut frequency resolution in half (which is good), but you double time resolution (which is bad). Even increasing the frame rate doesn’t help. You get more samples, but the range of frequencies increases at the same time.

这种权衡被称为Gabor 极限,它是这种时频分析的一个根本限制。

This tradeoff is called the Gabor limit and it is a fundamental limitation of this kind of time–frequency analysis.

泄漏

Leakage

为了解释make_spectrogram它的工作原理,我必须先解释窗口;为了解释窗口,我必须向你展示它旨在解决的问题,即漏水。

In order to explain how make_spectrogram works, I have to explain windowing; and in order to explain windowing, I have to show you the problem it is meant to address, which is leakage.

我们用来计算Spectrums 的离散傅里叶变换 (DFT) 将波视为周期性的;也就是说,它假设它所处理的有限段是无限重复信号的一个完整周期。在实践中,这种假设通常是不成立的,这会造成问题。

The Discrete Fourier Transform (DFT), which we use to compute Spectrums, treats waves as if they are periodic; that is, it assumes that the finite segment it operates on is a complete period from an infinite signal that repeats over all time. In practice, this assumption is often false, which creates problems.

一个常见的问题是信号段开头和结尾处的不连续性。由于离散傅里叶变换 (DFT) 假设信号是周期性的,它会隐式地将信号段的结尾连接回开头,形成一个环路。如果结尾与开头连接不平滑,这种不连续性会在信号段中引入信号本身不存在的额外频率分量。

One common problem is discontinuity at the beginning and end of the segment. Because the DFT assumes that the signal is periodic, it implicitly connects the end of the segment back to the beginning to make a loop. If the end does not connect smoothly to the beginning, the discontinuity creates additional frequency components in the segment that are not in the signal.

例如,我们从一个只包含一个频率分量(440 Hz)的正弦信号开始:

As an example, let’s start with a sine signal that contains only one frequency component at 440 Hz:

signal = thinkdsp.SinSignal(freq=440)
signal = thinkdsp.SinSignal(freq=440)

如果我们选择一个恰好是周期整数倍的片段,则该片段的结尾与开头平滑连接,并且 DFT 表现良好:

If we select a segment that happens to be an integer multiple of the period, the end of the segment connects smoothly with the beginning, and the DFT behaves well:

持续时间 = 信号周期 * 30
wave = signal.make_wave(duration)
spectrum = wave.make_spectrum()
duration = signal.period * 30
wave = signal.make_wave(duration)
spectrum = wave.make_spectrum()

图 3-4(左)显示了结果。正如预期的那样,在 440 Hz 处有一个峰值。

Figure 3-4 (left) shows the result. As expected, there is a single peak at 440 Hz.

但如果持续时间不是周期的整数倍,就会出现问题。在这种情况下duration = signal.period * 30.25,信号从 0 开始,到 1 结束。

But if the duration is not a multiple of the period, bad things happen. With duration = signal.period * 30.25, the signal starts at 0 and ends at 1.

图 3-4(中间)显示了该片段的频谱。峰值仍然位于 440 Hz,但现在出现了分布在 240 Hz 到 640 Hz 之间的其他成分。这种分布称为频谱泄漏,因为原本位于基频的部分能量泄漏到了其他频率上。

Figure 3-4 (middle) shows the spectrum of this segment. Again, the peak is at 440 Hz, but now there are additional components spread out from 240 to 640 Hz. This spread is called spectral leakage, because some of the energy that is actually at the fundamental frequency leaks into other frequencies.

图 3-4(右)显示了加窗信号的频谱。加窗处理显著降低了泄漏,但并未完全消除。

Figure 3-4 (right) shows the spectrum of the windowed signal. Windowing has reduced leakage substantially, but not completely.

图 3-4.正弦波周期段(左)、非周期段(中)和加窗非周期段(右)的频谱。

在这个例子中,发生泄漏是因为我们对一个当视为周期性时会变得不连续的段使用了 DFT。

In this example, leakage happens because we are using the DFT on a segment that becomes discontinuous when we treat it as periodic.

窗口

Windowing

我们可以通过平滑段的开头和结尾之间的不连续性来减少泄漏,而实现这一目标的一种方法就是加窗

We can reduce leakage by smoothing out the discontinuity between the beginning and end of the segment, and one way to do that is windowing.

“窗口”函数旨在将非周期性线段转换为看似周期性的线段。图 3-5(上图)显示了一个线段,其端点与起点并不平滑连接。

A “window” is a function designed to transform a non-periodic segment into something that can pass for periodic. Figure 3-5 (top) shows a segment where the end does not connect smoothly to the beginning.

图 3-5(中间)展示了“汉明窗”,这是一种比较常见的窗函数。没有完美的窗函数,但有些窗函数在不同的应用场景下表现最佳,而汉明窗就是一种用途广泛的优秀窗函数。

Figure 3-5 (middle) shows a “Hamming window”, one of the more common window functions. No window function is perfect, but some can be shown to be optimal for different applications, and Hamming is a good, all-purpose window.

图 3-5(底部)显示了将窗口乘以原始信号的结果。当窗口值接近 1 时,信号保持不变;当窗口值接近 0 时,信号被衰减。由于窗口在两端逐渐变窄,因此信号段的末端与起始端平滑衔接。

Figure 3-5 (bottom) shows the result of multiplying the window by the original signal. Where the window is close to 1, the signal is unchanged. Where the window is close to 0, the signal is attenuated. Because the window tapers at both ends, the end of the segment connects smoothly to the beginning.

图 3-5.正弦曲线段(上)、汉明窗(中)以及该曲线段与汉明窗的乘积(下)。

以下是代码。Wave它提供window了应用汉明窗的功能:

Here’s what the code looks like. Wave provides window, which applies a Hamming window:

#class Wave:
    def window(self, window):
        self.ys *= window
#class Wave:
    def window(self, window):
        self.ys *= window

NumPy 提供了hamming一个函数,用于计算给定长度的汉明窗:

And NumPy provides hamming, which computes a Hamming window with a given length:

window = np.hamming(len(wave))
wave.window(window)
window = np.hamming(len(wave))
wave.window(window)

NumPy 提供了计算其他窗口函数的函数,包括bartlettblackmanhanningkaiser。本章末尾的练习之一要求你尝试使用这些其他窗口函数。

NumPy provides functions to compute other window functions, including bartlett, blackman, hanning, and kaiser. One of the exercises at the end of this chapter asks you to experiment with these other windows.

实施频谱图

Implementing Spectrograms

现在我们理解了窗口化,就可以理解其实现方式了make_spectrogram。以下是Wave计算频谱图的方法:

Now that we understand windowing, we can understand the implementation of make_spectrogram. Here is the Wave method that computes spectrograms:

#class Wave:
    def make_spectrogram(self, seg_length):
        window = np.hamming(seg_length)
        i, j = 0, seg_length
        步长 = 段长度 / 2

        spec_map = {}

        当 j < len(self.ys):
            segment = self.slice(i, j)
            segment.window(window)

            t = (segment.start + segment.end) / 2
            spec_map[t] = segment.make_spectrum()

            i += 步骤
            j += 步骤

        返回频谱图(spec_map, seg_length)
#class Wave:
    def make_spectrogram(self, seg_length):
        window = np.hamming(seg_length)
        i, j = 0, seg_length
        step = seg_length / 2

        spec_map = {}

        while j < len(self.ys):
            segment = self.slice(i, j)
            segment.window(window)

            t = (segment.start + segment.end) / 2
            spec_map[t] = segment.make_spectrum()

            i += step
            j += step

        return Spectrogram(spec_map, seg_length)

这是本书中最长的函数,所以如果你能处理好这个函数,你就能处理好任何函数。

This is the longest function in the book, so if you can handle this, you can handle anything.

参数self是一个Wave对象,seg_length表示每个片段中的样本数。

The parameter, self, is a Wave object. seg_length is the number of samples in each segment.

window是一个与分段长度相同的汉明窗。

window is a Hamming window with the same length as the segments.

ij是用于从波形中选择片段的切片索引。step是片段之间的偏移量。由于step是 的一半seg_length,因此片段重叠一半。图 3-6显示了这些重叠窗口的样子。

i and j are the slice indices that select segments from the wave. step is the offset between segments. Since step is half of seg_length, the segments overlap by half. Figure 3-6 shows what these overlapping windows look like.

图 3-6.重叠的汉明窗。

spec_map是一个字典,它将时间戳映射到Spectrum

spec_map is a dictionary that maps from a timestamp to a Spectrum.

在循环内部while,我们从波形中选择一个切片并应用窗口;然后我们构造一个Spectrum对象并将其添加到spec_map。每个片段的标称时间,t即中点。

Inside the while loop, we select a slice from the wave and apply the window; then we construct a Spectrum object and add it to spec_map. The nominal time of each segment, t, is the midpoint.

然后我们前进ij并继续,只要j不超过末尾Wave

Then we advance i and j, and continue as long as j doesn’t go past the end of the Wave.

最后,该方法构造并返回一个Spectrogram对象。以下是该类的定义:

Finally, the method constructs and returns a Spectrogram object. Here is the definition of the class:

class Spectrogram(object):
    def __init__(self, spec_map, seg_length):
        self.spec_map = spec_map
        self.seg_length = seg_length
class Spectrogram(object):
    def __init__(self, spec_map, seg_length):
        self.spec_map = spec_map
        self.seg_length = seg_length

与许多初始化方法一样,这个方法只是将参数存储为属性。

Like many init methods, this one just stores the parameters as attributes.

Spectrogram提供plot,生成以时间为 x 轴、频率为 y 轴的伪彩色图。

Spectrogram provides plot, which generates a pseudocolor plot with time along the x-axis and frequency along the y-axis.

这就是Spectrograms 的实现方式。

And that’s how Spectrograms are implemented.

练习

Exercises

这些练习的答案在chap03soln.ipynb……

Solutions to these exercises are in chap03soln.ipynb.

练习 3-1。

运行并收听示例chap03.ipynb,该示例位于本书的存储库中,也可在http://tinyurl.com/thinkdsp03获取。

Run and listen to the examples in chap03.ipynb, which is in the repository for this book, and also available at http://tinyurl.com/thinkdsp03.

在泄漏示例中,尝试将汉明窗替换为 NumPy 提供的其他窗口之一,并观察它们对泄漏的影响。参见http://docs.scipy.org/doc/numpy/reference/routines.window.html

In the leakage example, try replacing the Hamming window with one of the other windows provided by NumPy, and see what effect they have on leakage. See http://docs.scipy.org/doc/numpy/reference/routines.window.html.

练习 3-2。

编写一个名为 `sawsow` 的类SawtoothChirp,该类继承Chirp并重写了evaluate`sawsow` 方法,以生成频率线性增加(或减少)的锯齿波形。

Write a class called SawtoothChirp that extends Chirp and overrides evaluate to generate a sawtooth waveform with frequency that increases (or decreases) linearly.

提示:将evaluate函数从Chirp和 中合并SawtoothSignal

Hint: combine the evaluate functions from Chirp and SawtoothSignal.

画出你认为该信号的频谱图草图,然后将其绘制出来。混叠效应应该在视觉上很明显,如果你仔细聆听,也能听到它。

Draw a sketch of what you think the spectrogram of this signal looks like, and then plot it. The effect of aliasing should be visually apparent, and if you listen carefully, you can hear it.

练习 3-3。

制作一个频率范围为 2500 至 3000 Hz 的锯齿波信号,然后用它来生成一个持续时间为 1 秒、帧率为 20 kHz 的波形。画出你认为频谱图的样子。然后绘制频谱图,看看你的预测是否正确。

Make a sawtooth chirp that sweeps from 2500 to 3000 Hz, then use it to make a wave with duration 1 s and frame rate 20 kHz. Draw a sketch of what you think the spectrum will look like. Then plot the spectrum and see if you got it right.

练习 3-4。

在音乐术语中,“滑音”是指音符从一个音高滑到另一个音高,因此它类似于啁啾声。

In musical terminology, a “glissando” is a note that slides from one pitch to another, so it is similar to a chirp.

找到或录制一段滑音,并绘制开头几秒钟的频谱图。一个建议:乔治·格什温的《蓝色狂想曲》(Rhapsody in Blue)可以从http://archive.org/details/rhapblue11924下载,它以一段著名的单簧管滑音开头。

Find or make a recording of a glissando and plot a spectrogram of the first few seconds. One suggestion: George Gershwin’s Rhapsody in Blue, which you can download from http://archive.org/details/rhapblue11924, starts with a famous clarinet glissando.

练习 3-5。

长号演奏者可以通过持续吹气并拉长长号滑管来演奏滑音。随着滑管的拉长,管子的总长度也会增加,而由此产生的音高与管子的长度成反比。

A trombone player can play a glissando by extending the trombone slide while blowing continuously. As the slide extends, the total length of the tube gets longer, and the resulting pitch is inversely proportional to length.

假设玩家以恒定速度移动滑梯,频率如何随时间变化?

Assuming that the player moves the slide at a constant speed, how does frequency vary with time?

编写一个名为 `Promise` 的类TromboneGliss,该类继承Chirp并提供了 `Promise` evaluate。创建一个波形,模拟长号从 C3 滑到 F3 再滑回 C3 的滑音。C3 的频率为 262 Hz;F3 的频率为 349 Hz。

Write a class called TromboneGliss that extends Chirp and provides evaluate. Make a wave that simulates a trombone glissando from C3 up to F3 and back down to C3. C3 is 262 Hz; F3 is 349 Hz.

绘制所得波形的频谱图。长号滑音更像是线性啁啾声还是指数啁啾声?

Plot a spectrogram of the resulting wave. Is a trombone glissando more like a linear or an exponential chirp?

练习 3-6。

录制或找到一段元音序列的录音,并观察其频谱图。你能识别出不同的元音吗?

Make or find a recording of a series of vowel sounds and look at the spectrogram. Can you identify different vowels?

1.方法名称以下划线开头表示它是“私有的”,表明它不是 API 的一部分,不能在类定义之外使用。

1 Beginning a method name with an underscore makes it “private”, indicating that it is not part of the API that should be used outside the class definition.

第四章噪声

Chapter 4. Noise

在英语中,“noise”指的是不想要的或令人不悦的声音。在信号处理的语境中,它有两种不同的含义:

In English, “noise” means an unwanted or unpleasant sound. In the context of signal processing, it has two different senses:

  1. 就像在英语中一样,它可以指任何类型的干扰信号。如果两个信号相互干扰,每个信号都会将另一个信号视为噪声。

  2. As in English, it can mean an unwanted signal of any kind. If two signals interfere with each other, each signal would consider the other to be noise.

  3. “噪声”也指包含许多频率分量的信号,因此它缺乏我们在前几章中看到的周期信号的谐波结构。

  4. “Noise” also refers to a signal that contains components at many frequencies, so it lacks the harmonic structure of the periodic signals we saw in previous chapters.

本章讨论的是第二种情况。

This chapter is about the second kind.

本章的代码chap04.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp04查看它。

The code for this chapter is in chap04.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp04.

不相关噪声

Uncorrelated Noise

理解噪声最简单的方法就是生成噪声,而最容易生成的噪声类型是无相关均匀噪声(UU噪声)。“均匀”是指信号包含来自均匀分布的随机值;也就是说,该范围内的每个值出现的概率均等。“无相关”是指这些值是独立的;也就是说,知道其中一个值并不能提供关于其他值的任何信息。

The simplest way to understand noise is to generate it, and the simplest kind to generate is uncorrelated uniform noise (UU noise). “Uniform” means the signal contains random values from a uniform distribution; that is, every value in the range is equally likely. “Uncorrelated” means that the values are independent; that is, knowing one value provides no information about the others.

这是一个表示 UU 噪声的类:

Here’s a class that represents UU noise:

class UncorrelatedUniformNoise(_Noise):

    def evaluate(self, ts):
        ys = np.random.uniform(-self.amp, self.amp, len(ts))
        返回 ys
class UncorrelatedUniformNoise(_Noise):

    def evaluate(self, ts):
        ys = np.random.uniform(-self.amp, self.amp, len(ts))
        return ys

UncorrelatedUniformNoise继承自_Noise,而又继承自Signal

UncorrelatedUniformNoise inherits from _Noise, which inherits from Signal.

与往常一样,`evaluate` 函数接受 `t`ts参数,即信号需要被评估的时间点。它使用np.random.uniform`u` 函数,该函数从均匀分布中生成值。在本例中,这些值介于 `0`-amp和 `1`之间amp

As usual, the evaluate function takes ts, the times when the signal should be evaluated. It uses np.random.uniform, which generates values from a uniform distribution. In this example, the values are in the range between -amp and amp.

以下示例生成持续时间为 0.5 秒、采样率为每秒 11,025 次的 UU 噪声:

The following example generates UU noise with duration 0.5 seconds at 11,025 samples per second:

signal = thinkdsp.UncorrelatedUniformNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
signal = thinkdsp.UncorrelatedUniformNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)

播放这段波形,听起来就像收音机换台时听到的静电噪音。图 4-1显示了波形的样子。不出所料,它看起来相当随机。

If you play this wave, it sounds like the static you hear if you tune a radio between channels. Figure 4-1 shows what the waveform looks like. As expected, it looks pretty random.

图 4-1.不相关均匀噪声的波形。

现在我们来看一下频谱:

Now let’s take a look at the spectrum:

spectrum = wave.make_spectrum()
spectrum.plot_power()
spectrum = wave.make_spectrum()
spectrum.plot_power()

Spectrum.plot_power它与 类似Spectrum.plot,区别在于它绘制的是功率而不是振幅。功率是振幅的平方。本章我将从振幅切换到功率,是因为在噪声的语境下,功率更为常用。

Spectrum.plot_power is similar to Spectrum.plot, except that it plots power instead of amplitude. Power is the square of amplitude. I am switching from amplitude to power in this chapter because it is more conventional in the context of noise.

图 4-2显示了结果。与信号一样,频谱看起来相当随机。事实上,它随机的,但我们需要更精确地定义“随机”一词。关于噪声信号或其频谱,我们至少需要了解以下三点:

Figure 4-2 shows the result. Like the signal, the spectrum looks pretty random. In fact, it is random, but we have to be more precise about the word “random”. There are at least three things we might like to know about a noise signal or its spectrum:

分配
Distribution

随机信号的分布是指所有可能取值及其概率的集合。例如,在均匀噪声信号中,取值范围为 -1 到 1,且所有值的概率均相同。另一种情况是高斯噪声,其取值范围为负无穷到正无穷,但接近 0 的值出现概率最高,其概率随时间呈高斯分布(或称“钟形”曲线)递减。

The distribution of a random signal is the set of possible values and their probabilities. For example, in the uniform noise signal, the set of values is the range from –1 to 1, and all values have the same probability. An alternative is Gaussian noise, where the set of values is the range from negative to positive infinity, but values near 0 are the most likely, with probability that drops off according to the Gaussian or “bell” curve.

相关性
Correlation

信号中的每个值都与其他值独立吗?还是它们之间存在依赖关系?在 UU 噪声中,各个值是独立的。另一种情况是布朗噪声,其中每个值都是前一个值加上一个随机“阶跃”之和。因此,如果信号在某一时刻的值很高,我们预期它会保持高值;如果它很低,我们预期它会保持低值。

Is each value in the signal independent from the others, or are there dependencies between them? In UU noise, the values are independent. An alternative is Brownian noise, where each value is the sum of the previous value and a random “step”. So if the value of the signal is high at a particular point in time, we expect it to stay high, and if it is low, we expect it to stay low.

功率与频率的关系
Relationship between power and frequency

在UU噪声频谱中,所有频率的功率都来自同一分布;也就是说,所有频率的平均功率都相同。另一种选择是粉红噪声,其功率与频率成反比;也就是说,频率f处的功率来自均值与频率成正比的分布

In the spectrum of UU noise, the power at all frequencies is drawn from the same distribution; that is, the average power is the same for all frequencies. An alternative is pink noise, where power is inversely related to frequency; that is, the power at frequency f is drawn from a distribution whose mean is proportional to .

图 4-2.不相关均匀噪声的功率谱。

综合频谱

Integrated Spectrum

对于 UU 噪声,我们可以通过观察积分频谱来更清楚地看到功率和频率之间的关系,积分频谱是频率f的函数,它显示了频谱中直至f 的累积功率。

For UU noise we can see the relationship between power and frequency more clearly by looking at the integrated spectrum, which is a function of frequency, f, that shows the cumulative power in the spectrum up to f.

Spectrum提供一种计算以下值的方法IntegratedSpectrum

Spectrum provides a method that computes the IntegratedSpectrum:

def make_integrated_spectrum(self):
    cs = np.cumsum(self.power)
    cs /= cs[-1]
    返回 IntegratedSpectrum(cs, self.fs)
def make_integrated_spectrum(self):
    cs = np.cumsum(self.power)
    cs /= cs[-1]
    return IntegratedSpectrum(cs, self.fs)

self.power是一个包含每个频率功率的 NumPy 数组。np.cumsum计算功率的累积和。除以最后一个元素,将积分频谱归一化,使其范围从 0 到 1。

self.power is a NumPy array containing the power for each frequency. np.cumsum computes the cumulative sum of the powers. Dividing through by the last element normalizes the integrated spectrum so it runs from 0 to 1.

结果是一个类IntegratedSpectrum。以下是类定义:

The result is an IntegratedSpectrum. Here is the class definition:

class IntegratedSpectrum(object):    
    def __init__(self, cs, fs):
        self.cs = cs
        self.fs = fs
class IntegratedSpectrum(object):    
    def __init__(self, cs, fs):
        self.cs = cs
        self.fs = fs

例如SpectrumIntegratedSpectrum提供了plot_power,因此我们可以计算并绘制积分光谱,如下所示:

Like Spectrum, IntegratedSpectrum provides plot_power, so we can compute and plot the integrated spectrum like this:

integ = spectrum.make_integrated_spectrum()
integ.plot_power()
integ = spectrum.make_integrated_spectrum()
integ.plot_power()

如图 4-3所示,结果是一条直线,表明所有频率的功率平均而言是恒定的。所有频率功率相等的噪声被称为白噪声,这是类比于光,因为所有可见频率的等量混合光是白色的。

The result, shown in Figure 4-3, is a straight line, which indicates that power at all frequencies is constant, on average. Noise with equal power at all frequencies is called white noise by analogy with light, because an equal mixture of light at all visible frequencies is white.

图 4-3.不相关均匀噪声的积分频谱。

布朗噪声

Brownian Noise

UU噪声是不相关的,这意味着每个值与其他值无关。另一种方法是布朗噪声,其中每个值都是前一个值加上一个随机“步长”的总和。

UU noise is uncorrelated, which means that each value does not depend on the others. An alternative is Brownian noise, in which each value is the sum of the previous value and a random “step”.

它之所以被称为“布朗运动”,是因为它与布朗运动类似。在布朗运动中,悬浮在流体中的粒子由于与流体之间看不见的相互作用而看似随机地运动。布朗运动通常用随机游走来描述,随机游走是一种数学模型,它描述了一条路径,其中步长之间的距离由随机分布表征。

It is called “Brownian” by analogy with Brownian motion, in which a particle suspended in a fluid moves apparently at random, due to unseen interactions with the fluid. Brownian motion is often described using a random walk, which is a mathematical model of a path where the distance between steps is characterized by a random distribution.

在一维随机游走中,粒子在每个时间步长内上下移动的距离都是随机的。粒子在任何时刻的位置都是之前所有时间步长位置的总和。

In a one-dimensional random walk, the particle moves up or down by a random amount at each timestep. The location of the particle at any point in time is the sum of all previous steps.

这一观察结果提示了一种生成布朗噪声的方法:生成不相关的随机步长,然后将它们累加起来。以下是实现该算法的类定义:

This observation suggests a way to generate Brownian noise: generate uncorrelated random steps and then add them up. Here is a class definition that implements this algorithm:

class BrownianNoise(_Noise):

    def evaluate(self, ts):
        dy = np.random.uniform(-1, 1, len(ts))
        ys = np.cumsum(dys)
        ys = normalize(unbias(ys), self.amp)
        返回 ys
class BrownianNoise(_Noise):

    def evaluate(self, ts):
        dys = np.random.uniform(-1, 1, len(ts))
        ys = np.cumsum(dys)
        ys = normalize(unbias(ys), self.amp)
        return ys

evaluate用于np.random.uniform生成不相关信号并np.cumsum计算其累积和。

evaluate uses np.random.uniform to generate an uncorrelated signal and np.cumsum to compute their cumulative sum.

由于总和很可能超出 -1 到 1 的范围,我们必须使用unbias将均值移至 0,以normalize获得所需的最大振幅。

Since the sum is likely to escape the range from –1 to 1, we have to use unbias to shift the mean to 0, and normalize to get the desired maximum amplitude.

以下代码用于生成BrownianNoise对象并绘制波形:

Here’s the code that generates a BrownianNoise object and plots the waveform:

signal = thinkdsp.BrownianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
wave.plot()
signal = thinkdsp.BrownianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
wave.plot()

图 4-4显示了结果。波形上下波动,但相邻数值之间存在明显的关联性。当振幅较高时,它往往保持较高水平,反之亦然。

Figure 4-4 shows the result. The waveform wanders up and down, but there is a clear correlation between successive values. When the amplitude is high, it tends to stay high, and vice versa.

图 4-4.布朗噪声波形。

如果将布朗噪声的频谱绘制成线性坐标图,如图4-5(左)所示,看起来并不复杂。几乎所有的能量都集中在最低频率上;高频分量则不可见。

If you plot the spectrum of Brownian noise on a linear scale, as in Figure 4-5 (left), it doesn’t look like much. Nearly all of the power is at the lowest frequencies; the higher frequency components are not visible.

图 4-5.布朗噪声的频谱,线性尺度(左)和双对数尺度(右)。

为了更清晰地观察频谱形状,我们可以用双对数坐标绘制功率和频率曲线。以下是代码:

To see the shape of the spectrum more clearly, we can plot power and frequency on a log-log scale. Here’s the code:

spectrum = wave.make_spectrum()
spectrum.plot_power(linewidth=1, alpha=0.5)
thinkplot.config(xscale='log', yscale='log')
spectrum = wave.make_spectrum()
spectrum.plot_power(linewidth=1, alpha=0.5)
thinkplot.config(xscale='log', yscale='log')

结果如图 4-5(右)所示。功率与频率之间的关系存在噪声,但大致呈线性关系。

The result is in Figure 4-5 (right). The relationship between power and frequency is noisy, but roughly linear.

Spectrum提供estimate_slope,它使用 SciPy 计算功率谱的最小二乘拟合:

Spectrum provides estimate_slope, which uses SciPy to compute a least squares fit to the power spectrum:

#光谱类

    def estimate_slope(self):
        x = np.log(self.fs[1:])
        y = np.log(self.power[1:])
        t = scipy.stats.linregress(x,y)
        返回 t
#class Spectrum

    def estimate_slope(self):
        x = np.log(self.fs[1:])
        y = np.log(self.power[1:])
        t = scipy.stats.linregress(x,y)
        return t

它舍弃了频谱的第一个分量,因为该分量对应于,并且未定义。

It discards the first component of the spectrum because this component corresponds to , and is undefined.

estimate_slope返回结果scipy.stats.linregress,该结果是一个包含估计斜率和截距、决定系数 ( )、p 值和标准误差的对象。就我们的目的而言,我们只需要斜率。

estimate_slope returns the result from scipy.stats.linregress, which is an object that contains the estimated slope and intercept, coefficient of determination (R2), p-value, and standard error. For our purposes, we only need the slope.

对于布朗噪声,功率谱的斜率为 -2(我们将在第 9 章中看到原因),因此我们可以写出以下关系:

For Brownian noise, the slope of the power spectrum is –2 (we’ll see why in Chapter 9), so we can write this relationship:

其中P为功率,f为频率,k为直线的截距,但对于我们的目的而言,截距并不重要。两边同时取指数可得:

where P is power, f is frequency, and k is the intercept of the line, which is not important for our purposes. Exponentiating both sides yields:

其中Ke k,但这仍然不重要。更重要的是,功率与成正比,这是布朗噪声的特征。

where K is ek, but still not important. More relevant is that power is proportional to , which is characteristic of Brownian noise.

布朗噪声也被称为红噪声,原因与白噪声被称为“白噪声”相同。如果将可见光与功率成正比的辐射混合,大部分功率将集中在频谱的低频端,也就是红色区域。布朗噪声有时也被称为“棕色噪声”,但我认为这容易造成混淆,所以我就不这么叫了。

Brownian noise is also called red noise, for the same reason that white noise is called “white”. If you combine visible light with power proportional to , most of the power will be at the low-frequency end of the spectrum, which is red. Brownian noise is also sometimes called “brown noise”, but I think that’s confusing, so I won’t use it.

粉红噪音

Pink Noise

对于红噪声,频率和功率之间的关系为:

For red noise, the relationship between frequency and power is:

指数 2 并没有什么特殊之处。更一般地,我们可以用任意指数β合成噪声:

There is nothing special about the exponent 2. More generally, we can synthesize noise with any exponent, β:

功率在所有频率上都恒定时,结果为白噪声。当功率增大时,结果为红噪声。

When , power is constant at all frequencies, so the result is white noise. When the result is red noise.

β介于0和2之间时,结果介于白噪声和红噪声之间,因此被称为粉红噪声

When β is between 0 and 2, the result is between white and red noise, so it is called pink noise.

生成粉红噪声的方法有很多种。最简单的方法是先生成白噪声,然后应用所需指数的低通滤波器。以下thinkdsp提供了一个表示粉红噪声信号的类:

There are several ways to generate pink noise. The simplest is to generate white noise and then apply a low-pass filter with the desired exponent. thinkdsp provides a class that represents a pink noise signal:

class PinkNoise(_Noise):

    def __init__(self, amp=1.0, beta=1.0):
        self.amp = amp
        self.beta = beta
class PinkNoise(_Noise):

    def __init__(self, amp=1.0, beta=1.0):
        self.amp = amp
        self.beta = beta

amp是信号的期望幅度。beta是期望指数。PinkNoise提供make_wave,它生成Wave

amp is the desired amplitude of the signal. beta is the desired exponent. PinkNoise provides make_wave, which generates a Wave:

def make_wave(self, duration=1, start=0, framerate=11025):
   信号 = UncorrelatedUniformNoise()
    wave = signal.make_wave(duration, start, framerate)
    spectrum = wave.make_spectrum()

    spectrum.pink_filter(beta=self.beta)

    wave2 = spectrum.make_wave()
    wave2.unbias()
    wave2.normalize(self.amp)
    返回波2
def make_wave(self, duration=1, start=0, framerate=11025):
   signal = UncorrelatedUniformNoise()
    wave = signal.make_wave(duration, start, framerate)
    spectrum = wave.make_spectrum()

    spectrum.pink_filter(beta=self.beta)

    wave2 = spectrum.make_wave()
    wave2.unbias()
    wave2.normalize(self.amp)
    return wave2

duration是波的长度,单位为秒。start是波的起始时间;包含起始时间是为了确保make_wave所有类型的信号都能使用相同的接口,但对于随机噪声,起始时间无关紧要。framerate是每秒采样数。

duration is the length of the wave in seconds. start is the start time of the wave; it is included so that make_wave has the same interface for all types of signal, but for random noise, start time is irrelevant. framerate is the number of samples per second.

make_wave生成一个白噪声波,计算其频谱,应用所需指数的滤波器,然后将滤波后的频谱转换回波形。最后,对波形进行去偏和归一化处理。

make_wave creates a white noise wave, computes its spectrum, applies a filter with the desired exponent, and then converts the filtered spectrum back to a wave. Then it unbiases and normalizes the wave.

Spectrum提供pink_filter

Spectrum provides pink_filter:

def pink_filter(self, beta=1.0):
    denom = self.fs ** (beta/2.0)
    denom[0] = 1
    self.hs /= denom
def pink_filter(self, beta=1.0):
    denom = self.fs ** (beta/2.0)
    denom[0] = 1
    self.hs /= denom

pink_filter将频谱的每个元素除以。由于功率是幅度的平方,因此该操作将每个分量的功率除以f  β。它将 处分量视为特例,部分原因是避免除以 0,部分原因是该元素代表信号的偏置,而我们无论如何都要将其设置为 0。

pink_filter divides each element of the spectrum by . Since power is the square of amplitude, this operation divides the power at each component by f β. It treats the component at as a special case, partly to avoid dividing by 0 and partly because this element represents the bias of the signal, which we are going to set to 0 anyway.

图 4-6显示了所得波形。与布朗噪声类似,它的波形上下波动,似乎暗示着连续值之间存在相关性,但至少从视觉上看,它更偏向随机性。下一章我们将再次讨论这一观察结果,届时我将更精确地解释“相关性”和“更偏向随机性”的含义。

Figure 4-6 shows the resulting waveform. Like that of Brownian noise, it wanders up and down in a way that suggests correlation between successive values, but at least visually, it looks more random. In the next chapter we will come back to this observation and I will be more precise about what I mean by “correlation” and “more random”.

图 4-6.粉红噪声波形

最后,图 4-7显示了白噪声、粉红噪声和红噪声在同一双对数坐标系下的频谱。图中可以明显看出指数β与频谱斜率之间的关系。

Finally, Figure 4-7 shows a spectrum for white, pink, and red noise on the same log-log scale. The relationship between the exponent, β, and the slope of the spectrum is apparent in this figure.

图 4-7.白噪声、粉红噪声和红噪声的频谱(对数-对数坐标)。

高斯噪声

Gaussian Noise

我们从不相关的均匀(UU)噪声开始,并证明,由于其频谱在所有频率上都具有相同的功率,因此平均而言,UU 噪声是白噪声。

We started with uncorrelated uniform (UU) noise and showed that, because its spectrum has equal power at all frequencies, on average, UU noise is white.

但人们谈论“白噪声”时,并不总是指UU噪声。事实上,他们更多时候指的是不相关的高斯(UG)噪声。

But when people talk about “white noise”, they don’t always mean UU noise. In fact, more often they mean uncorrelated Gaussian (UG) noise.

thinkdsp提供了一种 UG 噪声的实现方式:

thinkdsp provides an implementation of UG noise:

class UncorrelatedGaussianNoise(_Noise):

    def evaluate(self, ts):
        ys = np.random.normal(0, self.amp, len(ts))
        返回 ys
class UncorrelatedGaussianNoise(_Noise):

    def evaluate(self, ts):
        ys = np.random.normal(0, self.amp, len(ts))
        return ys

np.random.normal返回一个 NumPy 数组,其中包含服从高斯分布的值,在本例中均值为 0,标准差为 σ self.amp。理论上,取值范围从负无穷到正无穷,但我们预期约 99% 的值介于 -3 和 3 之间。

np.random.normal returns a NumPy array of values from a Gaussian distribution, in this case with mean 0 and standard deviation self.amp. In theory the range of values is from negative to positive infinity, but we expect about 99% of the values to be between –3 and 3.

UG噪声在很多方面与UU噪声相似。平均而言,其频谱在所有频率上的功率相等,因此UG噪声也是白噪声。它还有一个有趣的特性:UG噪声的频谱本身也是UG噪声。更准确地说,频谱的实部和虚部是不相关的高斯值。

UG noise is similar in many ways to UU noise. The spectrum has equal power at all frequencies, on average, so UG is also white. And it has one other interesting property: the spectrum of UG noise is also UG noise. More precisely, the real and imaginary parts of the spectrum are uncorrelated Gaussian values.

为了验证这一说法,我们可以生成 UG 噪声的频谱,然后生成“正态概率图”,这是一种用图形方式检验分布是否为高斯分布的方法:

To test that claim, we can generate the spectrum of UG noise and then generate a “normal probability plot”, which is a graphical way to test whether a distribution is Gaussian:

signal = thinkdsp.UncorrelatedGaussianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
spectrum = wave.make_spectrum()

thinkstats2.NormalProbabilityPlot(spectrum.real)
thinkstats2.NormalProbabilityPlot(spectrum.imag)
signal = thinkdsp.UncorrelatedGaussianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
spectrum = wave.make_spectrum()

thinkstats2.NormalProbabilityPlot(spectrum.real)
thinkstats2.NormalProbabilityPlot(spectrum.imag)

NormalProbabilityPlot本书的存储库中包含了该工具thinkstats2。如果您不熟悉正态概率图,可以在Think Stats的第 5 章中阅读相关内容,网址为http://thinkstats2.com

NormalProbabilityPlot is provided by thinkstats2, which is included in the repository for this book. If you are not familiar with normal probability plots, you can read about them in Chapter 5 of Think Stats at http://thinkstats2.com.

图 4-8显示了结果。灰色线条表示对数据进行线性拟合的结果;深色线条表示数据本身。

Figure 4-8 shows the results. The gray lines show a linear model fit to the data; the dark lines show the data.

正态概率图上的直线表明数据来自高斯分布。除了极值处的随机波动外,这些直线基本呈直线,这表明 UG 噪声的频谱就是 UG 噪声。

A straight line on a normal probability plot indicates that the data come from a Gaussian distribution. Except for some random variation at the extremes, these lines are straight, which indicates that the spectrum of UG noise is UG noise.

UU噪声的频谱也近似于UG噪声。事实上,根据中心极限定理,只要分布具有有限的均值和标准差,且样本数量足够大,几乎任何不相关噪声的频谱都近似服从高斯分布。

The spectrum of UU noise is also UG noise, at least approximately. In fact, by the Central Limit Theorem, the spectrum of almost any uncorrelated noise is approximately Gaussian, as long as the distribution has finite mean and standard deviation, and the number of samples is large.

图 4-8.高斯噪声频谱的实部和虚部的正态概率图。

练习

Exercises

这些练习的答案在chap04soln.ipynb……

Solutions to these exercises are in chap04soln.ipynb.

练习 4-1。

A Soft Murmur 是一个播放各种自然噪音源(包括雨声、海浪声、风声等)的网站。您可以在http://asoftmurmur.com/about/找到他们的录音列表,其中大部分录音都可以在http://freesound.org上找到。

A Soft Murmur is a website that plays a mixture of natural noise sources, including rain, waves, wind, etc. At http://asoftmurmur.com/about/ you can find their list of recordings, most of which are at http://freesound.org.

下载其中几个文件,并计算每个信号的频谱。功率谱看起来像白噪声、粉红噪声还是布朗噪声?频谱如何随时间变化?

Download a few of these files and compute the spectrum of each signal. Does the power spectrum look like white noise, pink noise, or Brownian noise? How does the spectrum vary over time?

练习 4-2。

在噪声信号中,频率的混合比例会随时间变化。从长远来看,我们预期所有频率的功率相等,但在任何一个样本中,每个频率的功率都是随机的。

In a noise signal, the mixture of frequencies changes over time. In the long run, we expect the power at all frequencies to be equal, but in any sample, the power at each frequency is random.

为了估计每个频率的长期平均功率,我们可以将长信号分成若干段,计算每段的功率谱,然后计算各段的平均值。您可以在http://en.wikipedia.org/wiki/Bartlett's_method上阅读更多关于此算法的信息。

To estimate the long-term average power at each frequency, we can break a long signal into segments, compute the power spectrum for each segment, and then compute the average across the segments. You can read more about this algorithm at http://en.wikipedia.org/wiki/Bartlett’s_method.

实现 Bartlett 方法并使用它来估计噪声波的功率谱。提示:查看make_spectrogram.

Implement Bartlett’s method and use it to estimate the power spectrum for a noise wave. Hint: look at the implementation of make_spectrogram.

练习 4-3。

http://www.coindesk.com/price,您可以下载比特币每日价格的历史数据,格式为 CSV 文件。读取此文件并计算比特币价格随时间变化的频谱。它更像白噪声、粉红噪声还是布朗噪声?

At http://www.coindesk.com/price, you can download historical data on the daily price of a BitCoin as a CSV file. Read this file and compute the spectrum of BitCoin prices as a function of time. Does it resemble white, pink, or Brownian noise?

练习 4-4。

盖革计数器是一种探测辐射的装置。当电离粒子撞击探测器时,它会输出一个电流脉冲。某一时刻的总输出可以建模为非相关泊松(UP)噪声,其中每个样本都是来自泊松分布的随机数,对应于该时间间隔内探测到的粒子数。

A Geiger counter is a device that detects radiation. When an ionizing particle strikes the detector, it outputs a surge of current. The total output at a point in time can be modeled as uncorrelated Poisson (UP) noise, where each sample is a random quantity from a Poisson distribution, which corresponds to the number of particles detected during an interval.

编写一个名为 `random` 的类UncorrelatedPoissonNoise,它继承自 ` Pios`thinkdsp._Noise类并提供一个函数evaluate。该类应该使用np.random.poisson`random` 函数从泊松分布中生成随机值。此函数的参数 ` lamp` 是每个时间间隔内的平均粒子数。您可以使用 `p` 属性amp来指定 `p`的值lam。例如,如果帧速率为 10 kHz,` ampp` 为 0.001,则我们预期每秒大约有 10 个“点击”。

Write a class called UncorrelatedPoissonNoise that inherits from thinkdsp._Noise and provides evaluate. It should use np.random.poisson to generate random values from a Poisson distribution. The parameter of this function, lam, is the average number of particles during each interval. You can use the attribute amp to specify lam. For example, if the frame rate is 10 kHz and amp is 0.001, we expect about 10 “clicks” per second.

生成一段大约持续一秒的UP噪声并仔细聆听。当取值较低时amp(例如0.001),它听起来应该像盖革计数器。当取值较高时,它听起来应该像白噪声。计算并绘制功率谱,以判断其是否符合白噪声的特征。

Generate about a second of UP noise and listen to it. For low values of amp, like 0.001, it should sound like a Geiger counter. For higher values it should sound like white noise. Compute and plot the power spectrum to see whether it looks like white noise.

练习 4-5。

本章中用于生成粉红噪声的算法概念简单,但计算量很大。还有更高效的替代方案,例如 Voss-McCartney 算法。请研究并实现该方法,计算结果的频谱,并确认其功率和频率之间的关系符合预期。

The algorithm in this chapter for generating pink noise is conceptually simple but computationally expensive. There are more efficient alternatives, like the Voss–McCartney algorithm. Research this method, implement it, compute the spectrum of the result, and confirm that it has the desired relationship between power and frequency.

第五章自相关

Chapter 5. Autocorrelation

上一章中,我将白噪声定义为“不相关的”,这意味着每个值都与其他值无关;而将布朗噪声定义为“相关的”,因为每个值都依赖于前一个值。本章中,我将更精确地定义这些术语,并介绍自相关函数,它是信号分析中一个有用的工具。

In the previous chapter I characterized white noise as “uncorrelated”, which means that each value is independent of the others, and Brownian noise as “correlated”, because each value depends on the preceding value. In this chapter I define these terms more precisely and present the autocorrelation function, which is a useful tool for signal analysis.

本章的代码位于chap05.ipynb本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp05查看。

The code for this chapter is in chap05.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp05.

相关性

Correlation

一般来说,变量之间的相关性意味着,如果你知道其中一个变量的值,你就能获得关于另一个变量的一些信息。量化相关性的方法有很多种,但最常用的是皮尔逊积矩相关系数,通常用ρ表示。对于两个变量xy,每个变量都包含N 个值:

In general, correlation between variables means that if you know the value of one, you have some information about the other. There are several ways to quantify correlation, but the most common is the Pearson product-moment correlation coefficient, usually denoted ρ. For two variables, x and y, that each contain N values:

其中μxμy分别是xy的均值,σxσy分别是它们的标准差。

where μx and μy are the means of x and y, and σx and σy are their standard deviations.

皮尔逊相关系数 ρ 的值始终介于 -1 和 +1 之间(包含 -1 和 +1)。如果ρ为正,则称相关性为正,这意味着当一个变量较高时,另一个变量也往往较高。如果ρ为负,则相关性为负,这意味着当一个变量较高时,另一个变量往往较低。

Pearson’s correlation is always between –1 and +1 (including both). If ρ is positive, we say that the correlation is positive, which means that when one variable is high, the other tends to be high. If ρ is negative, the correlation is negative, so when one variable is high, the other tends to be low.

ρ的大小表示相关性的强度。如果ρ为 1 或 -1,则变量完全相关,这意味着如果你知道其中一个变量的值,就可以完美地预测另一个变量的值。如果ρ接近于零,则相关性可能很弱,因此如果你知道其中一个变量的值,就无法很好地预测其他变量的值。

The magnitude of ρ indicates the strength of the correlation. If ρ is 1 or –1, the variables are perfectly correlated, which means that if you know one, you can make a perfect prediction about the other. If ρ is near zero, the correlation is probably weak, so if you know one, it doesn’t tell you much about the others.

我说“可能较弱”,是因为也可能存在相关系数无法捕捉到的非线性关系。非线性关系在统计学中通常很重要,但在信号处理中则较少涉及,因此我在此不再赘述。

I say “probably weak” because it is also possible that there is a nonlinear relationship that is not captured by the coefficient of correlation. Nonlinear relationships are often important in statistics, but less often relevant for signal processing, so I won’t say more about them here.

Python 提供了多种计算相关性的方法。np.corrcoef它可以接受任意数量的变量,并计算出一个相关矩阵,其中包含每对变量之间的相关性。

Python provides several ways to compute correlations. np.corrcoef takes any number of variables and computes a correlation matrix that includes correlations between each pair of variables.

我将举一个只有两个变量的例子。首先,我定义一个函数,用于构造具有不同相位偏移的正弦波:

I’ll present an example with only two variables. First, I define a function that constructs sine waves with different phase offsets:

def make_sine(offset):
    signal = thinkdsp.SinSignal(freq=440, offset=offset)
    wave = signal.make_wave(duration=0.5, framerate=10000)
    返回波
def make_sine(offset):
    signal = thinkdsp.SinSignal(freq=440, offset=offset)
    wave = signal.make_wave(duration=0.5, framerate=10000)
    return wave

接下来,我实例化两个具有不同偏移量的波形:

Next I instantiate two waves with different offsets:

wave1 = make_sine(offset=0)
wave2 = make_sine(offset=1)
wave1 = make_sine(offset=0)
wave2 = make_sine(offset=1)

图 5-1显示了这些波的前几个周期的样子。当一个波很高时,另一个波通常也很高,因此我们预期它们是相关的:

Figure 5-1 shows what the first few periods of these waves look like. When one wave is high, the other is usually high, so we expect them to be correlated:

>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)
[[ 1. 0.54]
 [ 0.54 1. ]]
>>> corr_matrix = np.corrcoef(wave1.ys, wave2.ys, ddof=0)
[[ 1.    0.54]
 [ 0.54  1.  ]]

该选项ddof=0表示corrcoef应该像上面的等式一样除以N,而不是使用默认值

The option ddof=0 indicates that corrcoef should divide by N, as in the equation above, rather than use the default, .

结果是一个相关矩阵。第一个元素是wave1与自身的相关系数,始终为 1。类似地,最后一个元素是wave2与自身的相关系数。

The result is a correlation matrix. The first element is the correlation of wave1 with itself, which is always 1. Similarly, the last element is the correlation of wave2 with itself.

非对角线元素包含我们感兴趣的值,即wave1和的相关性wave2。值 0.54 表示相关性强度为中等

The off-diagonal elements contain the value we’re interested in, the correlation of wave1 and wave2. The value 0.54 indicates that the strength of the correlation is moderate.

图 5-1.两个相位偏移为 1 弧度的正弦波;它们的相相关系数为 0.54。

随着相位偏移增大,相关性逐渐减小,直到波的相位差达到 180 度,此时相关性为 -1。然后相关性增大,直到相位偏移达到 360 度。此时,我们完成了一个完整的循环,相关性为 1。

As the phase offset increases, this correlation decreases until the waves are 180 degrees out of phase, which yields correlation –1. Then it increases until the offset differs by 360 degrees. At that point we have come full circle and the correlation is 1.

图 5-2显示了正弦波的相关性和相位偏移之间的关系。这条曲线的形状应该很熟悉;它是一条余弦曲线。

Figure 5-2 shows the relationship between correlation and phase offset for a sine wave. The shape of that curve should look familiar; it is a cosine.

图 5-2.两个正弦波的相关性与它们之间的相位差的关系。结果是一个余弦函数。

thinkdsp提供一个简单的界面来计算波之间的相关性:

thinkdsp provides a simple interface for computing the correlation between waves:

>>> wave1.corr(wave2) 
0.54
>>> wave1.corr(wave2)
0.54

序列相关性

Serial Correlation

信号通常代表随时间变化的物理量的测量值。例如,我们处理过的声音信号代表电压(或电流)的测量值,这对应于我们感知到的声音的气压变化。

Signals often represent measurements of quantities that vary in time. For example, the sound signals we’ve worked with represent measurements of voltage (or current), which correspond to the changes in air pressure we perceive as sound.

这类测量几乎总是存在序列相关性,即每个元素与其下一个(或前一个)元素之间的相关性。为了计算序列相关性,我们可以对信号进行移位,然后计算移位后的信号与原始信号的相关性:

Measurements like these almost always have serial correlation, which is the correlation between each element and the next (or the previous). To compute serial correlation, we can shift a signal and then compute the correlation of the shifted version with the original:

def serial_corr(wave, lag=1):
    n = len(wave)
    y1 = wave.ys[lag:]
    y2 = wave.ys[:n-lag]
    corr = np.corrcoef(y1, y2, ddof=0)[0, 1]
    返回正确
def serial_corr(wave, lag=1):
    n = len(wave)
    y1 = wave.ys[lag:]
    y2 = wave.ys[:n-lag]
    corr = np.corrcoef(y1, y2, ddof=0)[0, 1]
    return corr

serial_corr函数接受一个Wave对象和lag一个参数,其中参数是要将波形平移的整数位数。它计算波形与其自身平移版本之间的相关性。

serial_corr takes a Wave object and lag, which is the integer number of places to shift the wave. It computes the correlation of the wave with a shifted version of itself.

我们可以用上一章的噪声信号来测试这个函数。根据噪声的生成方式(更不用说它的名称),我们预期 UU 噪声是不相关的:

We can test this function with the noise signals from the previous chapter. We expect UU noise to be uncorrelated, based on the way it’s generated (not to mention the name):

signal = thinkdsp.UncorrelatedGaussianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
serial_corr(wave)
signal = thinkdsp.UncorrelatedGaussianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
serial_corr(wave)

我运行这个例子后得到 0.006,这表明序列相关性非常小。你运行后可能会得到不同的值,但应该也同样很小。

When I ran this example, I got 0.006, which indicates a very small serial correlation. You might get a different value when you run it, but it should be comparably small.

在布朗噪声信号中,每个值都是前一个值加上一个随机“步长”的总和,因此我们预期会存在很强的序列相关性:

In a Brownian noise signal, each value is the sum of the previous value and a random “step”, so we expect a strong serial correlation:

signal = thinkdsp.BrownianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
serial_corr(wave)
signal = thinkdsp.BrownianNoise()
wave = signal.make_wave(duration=0.5, framerate=11025)
serial_corr(wave)

果然,我得到的结果大于 0.999。

Sure enough, the result I got is greater than 0.999.

由于粉红噪声在某种意义上介于布朗噪声和UU噪声之间,我们可以预期它们之间存在中间相关性:

Since pink noise is in some sense between Brownian noise and UU noise, we might expect an intermediate correlation:

signal = thinkdsp.PinkNoise(beta=1)
wave = signal.make_wave(duration=0.5, framerate=11025)
serial_corr(wave)
signal = thinkdsp.PinkNoise(beta=1)
wave = signal.make_wave(duration=0.5, framerate=11025)
serial_corr(wave)

当参数为 时,我得到了序列相关性为 0.851。随着参数从(不相关噪声)变化到(布朗运动),序列相关性从 0 变化到接近 1,如图 5-3所示。

With parameter , I got a serial correlation of 0.851. As we vary the parameter from , which is uncorrelated noise, to , which is Brownian, serial correlation ranges from 0 to almost 1, as shown in Figure 5-3.

图 5-3.粉红噪声的序列相关性,参数范围为 100% 到 100% 不等。

自相关性

Autocorrelation

在上一节中,我们计算了每个值与下一个值之间的相关性,因此我们将数组的元素向右移动了 1。但是,我们可以很容易地计算具有不同滞后的序列相关性。

In the previous section we computed the correlation between each value and the next, so we shifted the elements of the array by 1. But we can easily compute serial correlations with different lags.

你可以将 视为serial_corr一个函数,它将 的每个值映射lag到相应的相关性,我们可以通过遍历 的值来评估该函数lag

You can think of serial_corr as a function that maps from each value of lag to the corresponding correlation, and we can evaluate that function by looping through values of lag:

def autocorr(wave):
    lags = range(len(wave.ys)//2)
    corrs = [serial_corr(wave, lag) for lag in lags]
    返回滞后,相关系数
def autocorr(wave):
    lags = range(len(wave.ys)//2)
    corrs = [serial_corr(wave, lag) for lag in lags]
    return lags, corrs

autocorr接受一个Wave对象,并返回自相关函数作为一对序列:lags是一个从 0 到波长度一半的整数序列;corrs是每个滞后的序列相关性。

autocorr takes a Wave object and returns the autocorrelation function as a pair of sequences: lags is a sequence of integers from 0 to half the length of the wave; corrs is the sequence of serial correlations for each lag.

图 5-4显示了三种不同β值下粉红噪声的自相关函数。当β值较小时,信号的相关性较弱,自相关函数迅速下降至零。当 β 值较大时,序列相关性更强,下降速度也更慢。即使存在较长的滞后,序列相关性仍然很强;这种现象被称为长程依赖性,因为它表明信号中的每个值都依赖于许多先前的值。

Figure 5-4 shows autocorrelation functions for pink noise with three values of β. For low values of β, the signal is less correlated, and the autocorrelation function drops off to zero quickly. For larger values, serial correlation is stronger and drops off more slowly. With serial correlation is strong even for long lags; this phenomenon is called long-range dependence, because it indicates that each value in the signal depends on many preceding values.

图 5-4.具有一系列参数的粉红噪声的自相关函数。

周期信号的自相关性

Autocorrelation of Periodic Signals

粉红噪声的自相关性具有有趣的数学性质,但应用范围有限。周期信号的自相关性则更有用。

The autocorrelation of pink noise has interesting mathematical properties, but limited applications. The autocorrelation of periodic signals is more useful.

例如,我从https://freesound.org下载了一段鸟鸣声的录音;本书的仓库中包含该文件(https://github.com/AllenDowney/ThinkDSP/blob/master/code/28042__bcjordan__voicedownbew.wav)。您可以使用本章的 Jupyter notebook来播放它。chap05.ipynb

As an example, I downloaded from https://freesound.org a recording of someone singing a chirp; the repository for this book includes the file (https://github.com/AllenDowney/ThinkDSP/blob/master/code/28042__bcjordan__voicedownbew.wav). You can use the Jupyter notebook for this chapter, chap05.ipynb, to play it.

图 5-5显示了该波的频谱图。基频和一些谐波清晰可见。啁啾声起始于 500 Hz 附近,并下降到约 300 Hz,大致位于 C5 到 E4 之间。

Figure 5-5 shows the spectrogram of this wave. The fundamental frequency and some of the harmonics show up clearly. The chirp starts near 500 Hz and drops down to about 300 Hz, roughly from C5 to E4.

为了估算特定时刻的音高,我们可以使用频谱,但效果并不理想。为了说明原因,我将截取波形的一小段并绘制其频谱:

To estimate pitch at a particular point in time, we could use the spectrum, but it doesn’t work very well. To see why not, I’ll take a short segment from the wave and plot its spectrum:

持续时间 = 0.01
segment = wave.segment(start=0.2, duration=duration)
spectrum = segment.make_spectrum()
spectrum.plot(high=1000)
duration = 0.01
segment = wave.segment(start=0.2, duration=duration)
spectrum = segment.make_spectrum()
spectrum.plot(high=1000)
图 5-5.鸣叫声的频谱图。

这段音频片段从 0.2 秒开始,持续 0.01 秒。图 5-6显示了它的频谱。在 400 Hz 附近有一个明显的峰值,但很难精确地确定音高。该片段的长度为 441 个采样点,帧率为 44,100 Hz,因此频率分辨率为 100 Hz(参见“Gabor 极限”)。这意味着估计的音高可能存在 50 Hz 的偏差;在音乐术语中,350 Hz 到 450 Hz 的范围大约是 5 个半音,这是一个很大的差异!

This segment starts at 0.2 seconds and lasts 0.01 seconds. Figure 5-6 shows its spectrum. There is a clear peak near 400 Hz, but it is hard to identify the pitch precisely. The length of the segment is 441 samples at a frame rate of 44,100 Hz, so the frequency resolution is 100 Hz (see “The Gabor Limit”). That means the estimated pitch might be off by 50 Hz; in musical terms, the range from 350 Hz to 450 Hz is about 5 semitones, which is a big difference!

图 5-6.人声啁啾声片段的频谱。

我们可以通过取更长的片段来获得更好的频率分辨率,但由于音调随时间变化,我们也会出现“运动模糊”;也就是说,峰值会在片段的开始音调和结束音调之间扩散,正如我们在“啁啾声的频谱”中所看到的那样。

We could get better frequency resolution by taking a longer segment, but since the pitch is changing over time, we would also get “motion blur”; that is, the peak would spread between the start and end pitch of the segment, as we saw in “Spectrum of a Chirp”.

我们可以使用自相关来更精确地估计音高。如果信号是周期性的,我们预期当滞后时间等于周期时,自相关函数会出现峰值。

We can estimate pitch more precisely using autocorrelation. If a signal is periodic, we expect the autocorrelation to spike when the lag equals the period.

为了说明这种方法的有效性,我将绘制同一段录音中的两个片段:

To show why that works, I’ll plot two segments from the same recording:

def plot_shifted(wave, offset=0.001, start=0.2):
    thinkplot.preplot(2)
    segment1 = wave.segment(start=start, duration=0.01)
    segment1.plot(linewidth=2, alpha=0.8)

    segment2 = wave.segment(start=start-offset, duration=0.01)
    segment2.shift(offset)
    segment2.plot(linewidth=2, alpha=0.4)

    corr = segment1.corr(segment2)
    text = r'$\rho =$ %.2g' % corr
    thinkplot.text(segment1.start+0.0005, -0.8, text)
    thinkplot.config(xlabel='时间(秒)')
def plot_shifted(wave, offset=0.001, start=0.2):
    thinkplot.preplot(2)
    segment1 = wave.segment(start=start, duration=0.01)
    segment1.plot(linewidth=2, alpha=0.8)

    segment2 = wave.segment(start=start-offset, duration=0.01)
    segment2.shift(offset)
    segment2.plot(linewidth=2, alpha=0.4)

    corr = segment1.corr(segment2)
    text = r'$\rho =$ %.2g' % corr
    thinkplot.text(segment1.start+0.0005, -0.8, text)
    thinkplot.config(xlabel='Time (s)')

一段信号起始于 0.2 秒;另一段信号起始于前者 0.0023 秒之后。图 5-7显示了结果。这两段信号非常相似,相关性为 0.99。该结果表明周期接近 0.0023 秒,对应的频率为 435 Hz。

One segment starts at 0.2 seconds; the other starts 0.0023 seconds later. Figure 5-7 shows the result. The segments are similar, and their correlation is 0.99. This result suggests that the period is near 0.0023 seconds, which corresponds to a frequency of 435 Hz.

图 5-7.啁啾声的两个片段,一个片段比另一个片段晚 0.0023 秒开始。

在这个例子中,我通过反复试验来估算周期。为了实现自动化,我们可以使用自相关函数:

For this example, I estimated the period by trial and error. To automate the process, we can use the autocorrelation function:

滞后,相关系数 = autocorr(segment)
thinkplot.plot(lags, corrs)
lags, corrs = autocorr(segment)
thinkplot.plot(lags, corrs)

图 5-8显示了从秒开始的片段的自相关函数。第一个峰值出现在lag=101。我们可以按如下方式计算与该周期对应的频率:

Figure 5-8 shows the autocorrelation function for the segment starting at seconds. The first peak occurs at lag=101. We can compute the frequency that corresponds to that period like this:

周期 = 延迟 / 片段帧率
频率 = 1 / 周期
period = lag / segment.framerate
frequency = 1 / period

估计的基频为 437 Hz。为了评估估计的精度,我们可以分别使用 100 和 102 的滞后进行相同的计算,这对应于 432 Hz 和 441 Hz 的频率。使用自相关法得到的频率精度小于 10 Hz,而使用频谱法得到的精度为 100 Hz。用音乐术语来说,预期误差约为 30 音分(三分之一半音)。

The estimated fundamental frequency is 437 Hz. To evaluate the precision of the estimate, we can run the same computation with lags 100 and 102, which correspond to frequencies 432 and 441 Hz. The frequency precision using autocorrelation is less than 10 Hz, compared with 100 Hz using the spectrum. In musical terms, the expected error is about 30 cents (a third of a semitone).

图 5-8.啁啾信号片段的自相关函数。

相关性即点积

Correlation as Dot Product

本章开头,我给出了皮尔逊相关系数的定义:

I started the chapter with this definition of Pearson’s correlation coefficient:

然后我用ρ定义了序列相关和自相关。这与统计学中这些术语的用法一致,但在信号处理的背景下,它们的定义略有不同。

Then I used ρ to define serial correlation and autocorrelation. That’s consistent with how these terms are used in statistics, but in the context of signal processing, the definitions are a little different.

在信号处理中,我们经常处理均值为 0 的无偏信号和标准差为 1 的归一化信号。在这种情况下,ρ的定义简化为:

In signal processing, we are often working with unbiased signals, where the mean is 0, and normalized signals, where the standard deviation is 1. In that case, the definition of ρ simplifies to:

而且通常还会进一步简化:

And it is common to simplify even further:

这种相关性的定义并非“标准化”的,因此其值通常不在 -1 到 1 之间。但它具有其他有用的特性。

This definition of correlation is not “standardized”, so it doesn’t generally fall between –1 and 1. But it has other useful properties.

如果你把xy看作向量,你可能会认出这个公式是点积公式参见http://en.wikipedia.org/wiki/Dot_product

If you think of x and y as vectors, you might recognize this formula as the dot product, . See http://en.wikipedia.org/wiki/Dot_product.

点积表示信号的相似程度。如果将它们归一化,使其标准差为 1:

The dot product indicates the degree to which the signals are similar. If they are normalized so their standard deviations are 1:

其中θ是向量之间的夹角。这就解释了为什么图5-2是一条余弦曲线。

where θ is the angle between the vectors. And that explains why Figure 5-2 is a cosine curve.

使用 NumPy

Using NumPy

NumPy 提供了一个函数 ` correlateccorrelation`,用于计算两个函数的相关性或一个函数的自相关性。我们可以用它来计算上一节中片段的自相关性:

NumPy provides a function, correlate, that computes the correlation of two functions or the autocorrelation of one function. We can use it to compute the autocorrelation of the segment from the previous section:

corrs2 = np.correlate(segment.ys, segment.ys, mode='same')
corrs2 = np.correlate(segment.ys, segment.ys, mode='same')

该选项mode指定correlate要使用的范围lag。当值为 时'same',范围为,其中N为波阵列的长度。

The option mode tells correlate what range of lag to use. With the value 'same', the range is from to , where N is the length of the wave array.

图 5-9显示了结果。由于两个信号完全相同,因此它是对称的,一个信号的负滞后与另一个信号的正滞后具有相同的效果。为了与之前的结果进行比较autocorr,我们可以选择后半部分:

Figure 5-9 shows the result. It is symmetric because the two signals are identical, so a negative lag on one has the same effect as a positive lag on the other. To compare with the results from autocorr, we can select the second half:

N = len(corrs2)
half = corrs2[N//2:]
N = len(corrs2)
half = corrs2[N//2:]
图 5-9.使用 np.correlate 计算的自相关函数。

比较图 5-9图 5-8,你会发现,随着滞后阶数的增加,计算得到的关联度np.correlate逐渐减小。这是因为该方法np.correlate使用了非标准化的关联度定义;随着滞后阶数的增大,两个信号重叠区域的点数减少,因此关联度的幅度也随之减小。

If you compare Figure 5-9 to Figure 5-8, you’ll notice that the correlations computed by np.correlate get smaller as the lags increase. That’s because np.correlate uses the unstandardized definition of correlation; as the lag gets bigger, the number of points in the overlap between the two signals gets smaller, so the magnitude of the correlations decreases.

我们可以通过除以长度来修正这个问题:

We can correct that by dividing through by the lengths:

长度 = range(N, N//2, -1)
一半/=长度
lengths = range(N, N//2, -1)
half /= lengths

最后,我们可以对结果进行归一化处理,使相关性为lag=01:

Finally, we can normalize the results so the correlation with lag=0 is 1:

half /= half[0]
half /= half[0]

autocorr经过这些调整,两种方法计算的结果np.correlate几乎相同,仅有1%到2%的差异。原因并不重要,但如果您好奇的话:autocorr对每个滞后阶数的相关性进行独立标准化;而对于np.correlate,我们最后对所有滞后阶数进行了标准化。

With these adjustments, the results computed by autocorr and np.correlate are nearly the same. They still differ by 1–2%. The reason is not important, but if you are curious: autocorr standardizes the correlations independently for each lag; for np.correlate, we standardized them all at the end.

更重要的是,现在你知道什么是自相关,如何使用它来估计信号的基本周期,以及计算自相关的两种方法。

More importantly, now you know what autocorrelation is, how to use it to estimate the fundamental period of a signal, and two ways to compute it.

练习

Exercises

这些练习的答案在chap05soln.ipynb……

Solutions to these exercises are in chap05soln.ipynb.

练习 5-1。

本章的 Jupyter notebookchap05.ipynb包含一个交互功能,可用于计算不同滞后的自相关性。使用此交互功能可以估计几个不同起始时间下人声啁啾的音调。

The Jupyter notebook for this chapter, chap05.ipynb, includes an interaction that lets you compute autocorrelations for different lags. Use this interaction to estimate the pitch of the vocal chirp for a few different start times.

练习 5-2。

示例代码chap05.ipynb展示了如何使用自相关来估计周期信号的基频。将此代码封装在一个名为 `tick_of` 的函数中estimate_fundamental,并使用它来跟踪录制声音的音高。

The example code in chap05.ipynb shows how to use autocorrelation to estimate the fundamental frequency of a periodic signal. Encapsulate this code in a function called estimate_fundamental, and use it to track the pitch of a recorded sound.

为了检验其效果,可以尝试将你的音高估计值叠加到录音的频谱图上。

To see how well it works, try superimposing your pitch estimates on a spectrogram of the recording.

练习 5-3。

如果你完成了上一章的练习,你应该已经下载了比特币的历史价格数据并估算了价格变化的功率谱。使用相同的数据,计算比特币价格的自相关性。自相关函数是否迅速下降?是否存在周期性行为的迹象?

If you did the exercises in the previous chapter, you downloaded the historical prices of BitCoins and estimated the power spectrum of the price changes. Using the same data, compute the autocorrelation of BitCoin prices. Does the autocorrelation function drop off quickly? Is there evidence of periodic behavior?

练习 5-4。

本书的存储库中有一个 Jupyter notebook saxophone.ipynb,它探讨了自相关、音高感知以及一种名为“缺失基频”的现象。请通读此 notebook 并运行其中的示例。尝试选择录音中的不同片段并再次运行示例。

In the repository for this book you will find a Jupyter notebook called saxophone.ipynb that explores autocorrelation, pitch perception, and a phenomenon called the missing fundamental. Read through this notebook and run the examples. Try selecting a different segment of the recording and running the examples again.

Vi Hart 制作了一段名为“噪音是怎么回事?(声音、频率和音高的科学与数学)”的精彩视频;它揭示了缺失的基本现象,并解释了音高感知是如何运作的(至少在我们目前所知的范围内)。观看地址:https://www.youtube.com/watch? v=i_0DXxNeaQ0 。

Vi Hart has an excellent video called “What is up with Noises? (The Science and Mathematics of Sound, Frequency, and Pitch)”; it demonstrates the missing fundamental phenomenon and explains how pitch perception works (at least, to the degree that we know). Watch it at https://www.youtube.com/watch?v=i_0DXxNeaQ0.

第六章离散余弦变换

Chapter 6. Discrete Cosine Transform

本章的主题是离散余弦变换(DCT),它用于 MP3 及相关格式中压缩音乐;JPEG 及类似格式中压缩图像;以及 MPEG 系列视频格式中压缩视频。

The topic of this chapter is the Discrete Cosine Transform (DCT), which is used in MP3 and related formats for compressing music; JPEG and similar formats for images; and the MPEG family of formats for video.

离散余弦变换(DCT)在很多方面都与我们用于频谱分析的离散傅里叶变换(DFT)相似。一旦我们理解了DCT的工作原理,解释DFT就会更容易。

The DCT is similar in many ways to the Discrete Fourier Transform (DFT), which we have been using for spectral analysis. Once we learn how the DCT works, it will be easier to explain the DFT.

以下是到达目的地的步骤:

Here are the steps to get there:

  1. 我们先从合成问题入手:给定一组频率分量及其振幅,我们如何构造一个波?

  2. We’ll start with the synthesis problem: given a set of frequency components and their amplitudes, how can we construct a wave?

  3. 接下来,我们将使用 NumPy 数组重写综合问题。这样做有利于性能提升,同时也为下一步提供了思路。

  4. Next we’ll rewrite the synthesis problem using NumPy arrays. This move is good for performance, and also provides insight for the next step.

  5. 我们将探讨以下分析问题:给定一个信号和一组频率,如何求出每个频率分量的幅度?我们将从一个概念上简单但速度较慢的解决方案开始。

  6. We’ll look at the analysis problem: given a signal and a set of frequencies, how can we find the amplitude of each frequency component? We’ll start with a solution that is conceptually simple but slow.

  7. 最后,我们将运用线性代数的一些原理来找到一个更高效的算法。如果你已经掌握了线性代数,那就太好了,但我会在讲解过程中解释你需要哪些知识。

  8. Finally, we’ll use some principles from linear algebra to find a more efficient algorithm. If you already know linear algebra, that’s great, but I’ll explain what you need as we go.

本章的代码chap06.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp06查看它。

The code for this chapter is in chap06.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp06.

合成

Synthesis

假设我给你一个振幅列表和一个频率列表,并要求你构建一个信号,该信号是这些频率分量的总和。使用thinkdsp模块中的对象,有一种简单的方法可以执行此操作,这称为合成

Suppose I give you a list of amplitudes and a list of frequencies, and ask you to construct a signal that is the sum of these frequency components. Using objects in the thinkdsp module, there is a simple way to perform this operation, which is called synthesis:

def synthesize1(amps, fs, ts):
    components = [thinkdsp.CosSignal(freq, amp)
                  对于 zip(amps, fs) 中的每个 amp 和 freq]
    signal = thinkdsp.SumSignal(*components)

    ys = signal.evaluate(ts)
    返回 ys
def synthesize1(amps, fs, ts):
    components = [thinkdsp.CosSignal(freq, amp)
                  for amp, freq in zip(amps, fs)]
    signal = thinkdsp.SumSignal(*components)

    ys = signal.evaluate(ts)
    return ys

amps是振幅列表,fs是频率列表,ts是信号评估时间序列。

amps is a list of amplitudes, fs is the list of frequencies, and ts is the sequence of times where the signal should be evaluated.

components是一个对象列表CosSignal,每个对象对应一个振幅-频率对。SumSignal表示这些频率分量的总和。

components is a list of CosSignal objects, one for each amplitude–frequency pair. SumSignal represents the sum of these frequency components.

最后,evaluate计算每个时刻的信号值ts

Finally, evaluate computes the value of the signal at each time in ts.

我们可以这样测试这个函数:

We can test this function like this:

amps = np.array([0.6, 0.25, 0.1, 0.05])
fs = [100, 200, 300, 400]
帧率 = 11025

ts = np.linspace(0, 1, framerate)
ys = synthesize1(amps, fs, ts)
wave = thinkdsp.Wave(ys, framerate)
amps = np.array([0.6, 0.25, 0.1, 0.05])
fs = [100, 200, 300, 400]
framerate = 11025

ts = np.linspace(0, 1, framerate)
ys = synthesize1(amps, fs, ts)
wave = thinkdsp.Wave(ys, framerate)

此示例生成一个包含 100 Hz 基频和三个谐波的信号(100 Hz 是一个尖锐的 G2 谐波)。它以每秒 11,025 帧的速度渲染该信号 1 秒钟,并将结果放入一个Wave对象中。

This example makes a signal that contains a fundamental frequency at 100 Hz and three harmonics (100 Hz is a sharp G2). It renders the signal for 1 second at 11,025 frames per second and puts the results into a Wave object.

从概念上讲,合成非常简单。但这种形式对分析帮助不大,分析是它的逆问题:给定一个波,我们如何识别其频率分量及其振幅?

Conceptually, synthesis is pretty simple. But in this form it doesn’t help much with analysis, which is the inverse problem: given the wave, how do we identify the frequency components and their amplitudes?

基于阵列的合成

Synthesis with Arrays

另一种写法是synthesize

Here’s another way to write synthesize:

def synthesize2(amps, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    ys = np.dot(M, amps)
    返回 ys
def synthesize2(amps, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    ys = np.dot(M, amps)
    return ys

这个函数看起来完全不同,但功能相同。我们来看看它是如何工作的:

This function looks very different, but it does the same thing. Let’s see how it works:

  1. np.outer计算和的外积。结果是一个数组,其中每一行对应 的一个元素,每一列对应 的一个元素。数组中的每个元素都是频率和时间的乘积。tsfstsfs

  2. np.outer computes the outer product of ts and fs. The result is an array with one row for each element of ts and one column for each element of fs. Each element in the array is the product of a frequency and a time, .

  3. 我们乘以args应用cos,因此结果的每个元素都是。由于沿着ts列向下运行,因此每一列都包含一个特定频率的余弦信号,该信号在一系列时间点上进行计算。

  4. We multiply args by and apply cos, so each element of the result is . Since the ts runs down the columns, each column contains a cosine signal at a particular frequency, evaluated at a sequence of times.

  5. np.dot将矩阵的每一行逐元素地M乘以amps向量,然后将乘积相加。用线性代数的术语来说,我们是在将矩阵M乘以向量。用信号学的术语来说,我们是在计算频率分量amps的加权和。

  6. np.dot multiplies each row of M by amps, elementwise, and then adds up the products. In terms of linear algebra, we are multiplying a matrix, M, by a vector, amps. In terms of signals, we are computing the weighted sum of frequency components.

图 6-1显示了该计算的结构。矩阵的每一行M对应于 0.0 到 1.0 秒之间的时间;t <sub>n</sub>是第n行的时间。每一列对应于 100 到 400 Hz 之间的频率;f <sub>k</sub>是第k列的频率。

Figure 6-1 shows the structure of this computation. Each row of the matrix, M, corresponds to a time from 0.0 to 1.0 seconds; tn is the time of the nth row. Each column corresponds to a frequency from 100 to 400 Hz; fk is the frequency of the kth column.

图 6-1.阵列合成。

我用字母ad标记了第n行;例如,a的值为。

I labeled the nth row with the letters a through d; as an example, the value of a is .

点积的结果ys是一个向量,其中每个元素对应于矩阵的每一行M。第n个元素,标记为e,是各乘积之和:

The result of the dot product, ys, is a vector with one element for each row of M. The nth element, labeled e, is the sum of products:

同样的道理也适用于其他元素ys。因此,的每个元素ys都是四个频率分量在某一时刻的取值之和,并乘以相应的振幅。这正是我们想要的。

And likewise with the other elements of ys. So each element of ys is the sum of four frequency components, evaluated at a point in time and multiplied by the corresponding amplitudes. And that’s exactly what we wanted.

我们可以使用上一节中的代码来验证这两个版本是否synthesize产生相同的结果:

We can use the code from the previous section to check that the two versions of synthesize produce the same results:

ys1 = synthesize1(amps, fs, ts)
ys2 = synthesize2(amps, fs, ts)
max(abs(ys1 - ys2))
ys1 = synthesize1(amps, fs, ts)
ys2 = synthesize2(amps, fs, ts)
max(abs(ys1 - ys2))

ys1和之间最大的区别ys2约为1e-13,这是我们由于浮点误差而预期的结果。

The biggest difference between ys1 and ys2 is about 1e-13, which is what we expect due to floating-point errors.

用线性代数来编写这个计算可以简化代码,提高运行速度。线性代数为矩阵和向量的运算提供了简洁的表示法。例如,我们可以synthesize这样写:

Writing this computation in terms of linear algebra makes the code smaller and faster. Linear algebra provides concise notation for operations on matrices and vectors. For example, we could write synthesize like this:

其中a是振幅向量,t是时间向量,f是频率向量,是两个向量外积的符号。

where a is a vector of amplitudes, t is a vector of times, f is a vector of frequencies, and is the symbol for the outer product of two vectors.

分析

Analysis

现在我们准备解决分析问题。假设我给你一个波,并告诉你它是给定频率集合的余弦波之和。你如何求出每个频率分量的振幅?换句话说,给定频率ysts频率 和fs频率 ,你能恢复出频率amps吗?

Now we are ready to solve the analysis problem. Suppose I give you a wave and tell you that it is the sum of cosines with a given set of frequencies. How would you find the amplitude for each frequency component? In other words, given ys, ts, and fs, can you recover amps?

就线性代数而言,第一步与综合相同:我们计算。然后我们需要找到一个,使得;换句话说,我们需要求解一个线性方程组。NumPy 提供了函数linalg.solve,它正是用来做这件事的。

In terms of linear algebra, the first step is the same as for synthesis: we compute . Then we want to find a so that ; in other words, we want to solve a linear system. NumPy provides linalg.solve, which does exactly that.

以下是代码:

Here’s what the code looks like:

def analyze1(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    amps = np.linalg.solve(M, ys)
    回流放大器
def analyze1(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    amps = np.linalg.solve(M, ys)
    return amps

前两行使用tsfs来构建矩阵,M然后np.linalg.solve计算amps

The first two lines use ts and fs to build the matrix, M. Then np.linalg.solve computes amps.

但这里有个问题。一般来说,只有当矩阵是方阵时,我们才能解线性方程组;也就是说,只有当方程的数量(行数)与未知数的数量(列数)相同时,我们才能解线性方程组。

But there’s a hitch. In general we can only solve a system of linear equations if the matrix is square; that is, if the number of equations (rows) is the same as the number of unknowns (columns).

在这个例子中,我们只有 4 个频率,但我们对信号进行了 11025 次评估。因此,我们的方程数量远远多于未知数的数量。

In this example, we have only 4 frequencies, but we evaluated the signal at 11,025 times. So we have many more equations than unknowns.

一般来说,如果ys包含超过 4 个元素,我们不太可能仅使用 4 个频率对其进行分析。

In general if ys contains more than 4 elements, it is unlikely that we can analyze it using only 4 frequencies.

但在这种情况下,我们知道这些ys值实际上是通过仅添加 4 个频率分量生成的,因此我们可以使用波阵列中的任意 4 个值来恢复amps

But in this case, we know that the ys values were actually generated by adding only 4 frequency components, so we can use any 4 values from the wave array to recover amps.

为简单起见,我将使用信号的前 4 个样本。利用上一节中得到的 、 和 的值,我们ys可以fs这样运行:tsanalyze1

For simplicity, I’ll use the first 4 samples from the signal. Using the values of ys, fs, and ts from the previous section, we can run analyze1 like this:

n = len(fs)
amps2 = analyze1(ys[:n], fs, ts[:n])
n = len(fs)
amps2 = analyze1(ys[:n], fs, ts[:n])

果不其然,amps2正是如此:

And sure enough, amps2 is:

[ 0.6 0.25 0.1 0.05 ]
[ 0.6   0.25  0.1   0.05 ]

这个算法虽然可行,但速度很慢。求解线性方程组所需的时间与成正比,其中n是矩阵M的列数。我们可以做得更好。

This algorithm works, but it is slow. Solving a linear system of equations takes time proportional to n3, where n is the number of columns in M. We can do better.

正交矩阵

Orthogonal Matrices

求解线性方程组的一种方法是求矩阵的逆。方阵M的逆记为 I 其性质为 I = 1。I是单位矩阵,其对角线元素均为 1,其余元素均为 0。

One way to solve linear systems is by inverting matrices. The inverse of a square matrix M is written , and it has the property that . I is the identity matrix, which has the value 1 on all diagonal elements and 0 everywhere else.

因此,为了解这个方程,我们可以将等式两边同时乘以,得到:

So, to solve the equation , we can multiply both sides by , which yields:

右侧我们可以I替换:

On the right side, we can replace with I:

如果我们将I乘以任意向量a,结果仍然是a,所以:

If we multiply I by any vector a, the result is a, so:

这意味着,如果我们能够高效地进行计算,我们就可以通过简单的矩阵乘法(使用)找到一个np.dot。这需要与成正比时间,比更优。

This implies that if we can compute efficiently, we can find a with a simple matrix multiplication (using np.dot). That takes time proportional to n2, which is better than n3.

通常来说,矩阵求逆速度较慢,但​​某些特殊情况下速度会更快。特别是,如果矩阵M正交矩阵,则M的逆矩阵就是M的转置矩阵,记作M T。在 NumPy 中,数组转置是一个常数时间操作。它实际上并不移动数组元素;而是创建一个“视图”,改变元素的访问方式。

Inverting a matrix is slow, in general, but some special cases are faster. In particular, if M is orthogonal, the inverse of M is just the transpose of M, written MT. In NumPy transposing an array is a constant-time operation. It doesn’t actually move the elements of the array; instead, it creates a “view” that changes the way the elements are accessed.

再次强调,如果一个矩阵的转置也是它的逆矩阵,则称该矩阵为正交矩阵;也就是说,。这意味着,我们可以通过计算M T M来检查一个矩阵是否正交。

Again, a matrix is orthogonal if its transpose is also its inverse; that is, . That implies that , which means we can check whether a matrix is orthogonal by computing MTM.

那么让我们看看矩阵在 中的样子synthesize2。在前面的例子中,矩阵 M有 11,025 行,所以最好使用一个更小的例子:

So let’s see what the matrix looks like in synthesize2. In the previous example, M has 11,025 rows, so it might be a good idea to work with a smaller example:

def test1():
    amps = np.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    时间单位 = 0.001
    ts = np.arange(N) / N * time_unit
    最大频率 = N / 时间单位 / 2
    fs = np.arange(N) / N * max_freq
    ys = synthesize2(amps, fs, ts)
def test1():
    amps = np.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    time_unit = 0.001
    ts = np.arange(N) / N * time_unit
    max_freq = N / time_unit / 2
    fs = np.arange(N) / N * max_freq
    ys = synthesize2(amps, fs, ts)

amps是之前看到的同一幅值向量。由于我们有 4 个频率分量,我们将对信号进行 4 次采样。这样,M就是平方。

amps is the same vector of amplitudes we saw before. Since we have 4 frequency components, we’ll sample the signal at 4 points in time. That way, M is square.

ts是一个向量,其中包含从 0 到 1 个时间单位的等间隔采样时间。我选择的时间单位是 1 毫秒,但这只是任意选择,稍后我们会看到,它最终会被排除在计算之外。

ts is a vector of equally spaced sample times in the range from 0 to 1 time unit. I chose the time unit to be 1 millisecond, but it is an arbitrary choice, and we will see in a minute that it drops out of the computation anyway.

由于帧速率为每时间单位N 个样本,因此奈奎斯特频率为N / time_unit / 2,在本例中为 2000 Hz。所以fs是一个介于 0 和 2000 Hz 之间的等间隔频率向量。

Since the frame rate is N samples per time unit, the Nyquist frequency is N / time_unit / 2, which is 2000 Hz in this example. So fs is a vector of equally spaced frequencies between 0 and 2000 Hz.

使用这些值tsfs矩阵M为:

With these values of ts and fs, the matrix, M, is:

[[ 1. 1. 1. 1. ]
 [ 1. 0.707 0. -0.707]
 [ 1. 0. -1. -0. ]
 [ 1. -0.707 -0. 0.707]
[[ 1.     1.     1.     1.   ]
 [ 1.     0.707  0.    -0.707]
 [ 1.     0.    -1.    -0.   ]
 [ 1.    -0.707 -0.     0.707]]

你可能认出 0.707 是 的近似值,即。你可能还会注意到这个矩阵是对称的,这意味着矩阵中第 i 个元素始终等于第 j 个元素。这意味着M是它自身的转置;也就是说,

You might recognize 0.707 as an approximation of , which is . You also might notice that this matrix is symmetric, which means that the element at always equals the element at . This implies that M is its own transpose; that is, .

但遗憾的是,M不是正交的。如果我们计算M T M,我们会得到:

But sadly, M is not orthogonal. If we compute MTM, we get:

[[ 4. 1. -0. 1.]
 [ 1. 2. 1. -0.]
 [-0. 1. 2. 1.]
 [ 1. -0. 1. 2.]
[[ 4.  1. -0.  1.]
 [ 1.  2.  1. -0.]
 [-0.  1.  2.  1.]
 [ 1. -0.  1.  2.]]

那不是单位矩阵。

And that’s not the identity matrix.

DCT-IV

DCT-IV

但是,如果我们选择tsfs仔细考虑,我们可以使M正交。有几种方法可以做到这一点,这也是为什么离散余弦变换 (DCT) 有多个版本的原因。

But if we choose ts and fs carefully, we can make M orthogonal. There are several ways to do it, which is why there are several versions of the Discrete Cosine Transform (DCT).

一个简单的办法是将变速器移位ts半个fs单位。这个版本叫做DCT-IV,其中“IV”是罗马数字,表示这是DCT变速器八个版本中的第四个。

One simple option is to shift ts and fs by a half unit. This version is called DCT-IV, where “IV” is a roman numeral indicating that this is the fourth of eight versions of the DCT.

以下是更新版本test1

Here’s an updated version of test1:

def test2():
    amps = np.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    ts = (0.5 + np.arange(N)) / N
    fs = (0.5 + np.arange(N)) / 2
    ys = synthesize2(amps, fs, ts)
def test2():
    amps = np.array([0.6, 0.25, 0.1, 0.05])
    N = 4.0
    ts = (0.5 + np.arange(N)) / N
    fs = (0.5 + np.arange(N)) / 2
    ys = synthesize2(amps, fs, ts)

如果将此版本与之前的版本进行比较,你会注意到两处变化。首先,我给ts和 都加上了 0.5 fs。其次,我约掉了time_units,这简化了 的表达式fs

If you compare this to the previous version, you’ll notice two changes. First, I added 0.5 to ts and fs. Second, I canceled out time_units, which simplifies the expression for fs.

根据这些值,M为:

With these values, M is:

[[ 0.981 0.831 0.556 0.195]
 [ 0.831 -0.195 -0.981 -0.556]
 [ 0.556 -0.981 0.195 0.831]
 [ 0.195 -0.556 0.831 -0.981]
[[ 0.981  0.831  0.556  0.195]
 [ 0.831 -0.195 -0.981 -0.556]
 [ 0.556 -0.981  0.195  0.831]
 [ 0.195 -0.556  0.831 -0.981]]

M T M是:

And MTM is:

[[ 2. 0. 0. 0.]
 [ 0. 2. -0. 0.]
 [ 0. -0. 2. -0.]
 [ 0. 0. -0. 2.]
[[ 2.  0.  0.  0.]
 [ 0.  2. -0.  0.]
 [ 0. -0.  2. -0.]
 [ 0.  0. -0.  2.]]

部分非对角线元素显示为 -0,这意味着其浮点表示是一个较小的负数。该矩阵非常接近2I,这意味着M几乎是正交的;它仅相差 2 倍。就我们的目的而言,这已经足够了。

Some of the off-diagonal elements are displayed as –0, which means that the floating-point representation is a small negative number. This matrix is very close to 2I, which means M is almost orthogonal; it’s just off by a factor of 2. And for our purposes, that’s good enough.

因为M是对称的且(几乎)正交的,所以M的逆矩阵就是 M' 。现在我们可以写出一个更高效的版本analyze

Because M is symmetric and (almost) orthogonal, the inverse of M is just . Now we can write a more efficient version of analyze:

def analyze2(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    amps = np.dot(M, ys) / 2
    回流放大器
def analyze2(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    amps = np.dot(M, ys) / 2
    return amps

我们不用np.linalg.solve,而是直接乘以

Instead of using np.linalg.solve, we just multiply by .

结合test2以上两点analyze2,我们可以编写 DCT-IV 的实现:

Combining test2 and analyze2, we can write an implementation of DCT-IV:

def dct_iv(ys):
    N = len(ys)
    ts = (0.5 + np.arange(N)) / N
    fs = (0.5 + np.arange(N)) / 2
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    amps = np.dot(M, ys) / 2
    回流放大器
def dct_iv(ys):
    N = len(ys)
    ts = (0.5 + np.arange(N)) / N
    fs = (0.5 + np.arange(N)) / 2
    args = np.outer(ts, fs)
    M = np.cos(PI2 * args)
    amps = np.dot(M, ys) / 2
    return amps

再次强调,ys这是波阵列。我们不必将tsfs作为参数传递;dct_iv可以根据N和 的长度来计算它们ys

Again, ys is the wave array. We don’t have to pass ts and fs as parameters; dct_iv can figure them out based on N, the length of ys.

如果我们的理解正确,这个函数应该能够解决分析问题;也就是说,给定一个条件,ys它应该能够恢复结果amps。我们可以这样测试它:

If we’ve got it right, this function should solve the analysis problem; that is, given ys it should be able to recover amps. We can test it like this:

amps = np.array([0.6, 0.25, 0.1, 0.05])
N = 4.0
ts = (0.5 + np.arange(N)) / N
fs = (0.5 + np.arange(N)) / 2
ys = synthesize2(amps, fs, ts)
amps2 = dct_iv(ys)
max(abs(amps - amps2))
amps = np.array([0.6, 0.25, 0.1, 0.05])
N = 4.0
ts = (0.5 + np.arange(N)) / N
fs = (0.5 + np.arange(N)) / 2
ys = synthesize2(amps, fs, ts)
amps2 = dct_iv(ys)
max(abs(amps - amps2))

首先amps,我们合成一个波阵列,然后用它dct_iv来计算。和amps2之间的最大差异约为,这符合我们预期,因为存在浮点误差。ampsamps21e-16

Starting with amps, we synthesize a wave array, then use dct_iv to compute amps2. The biggest difference between amps and amps2 is about 1e-16, which is what we expect due to floating-point errors.

逆离散余弦变换

Inverse DCT

最后,注意到analyze2synthesize2几乎完全相同。唯一的区别在于analyze2将结果除以 2。我们可以利用这一特性来计算逆 DCT:

Finally, notice that analyze2 and synthesize2 are almost identical. The only difference is that analyze2 divides the result by 2. We can use this insight to compute the inverse DCT:

def inverse_dct_iv(amps):
    返回 dct_iv(amps) * 2
def inverse_dct_iv(amps):
    return dct_iv(amps) * 2

inverse_dct_iv它解决了合成问题:它接受振幅向量并返回波数组ys。我们可以通过从 开始amps,应用inverse_dct_ivdct_iv,并测试是否能得到与初始值相同的结果来验证它:

inverse_dct_iv solves the synthesis problem: it takes the vector of amplitudes and returns the wave array, ys. We can test it by starting with amps, applying inverse_dct_iv and dct_iv, and testing that we get back what we started with:

安培数 = [0.6, 0.25, 0.1, 0.05]
ys = inverse_dct_iv(amps)
amps2 = dct_iv(ys)
max(abs(amps - amps2))
amps = [0.6, 0.25, 0.1, 0.05]
ys = inverse_dct_iv(amps)
amps2 = dct_iv(ys)
max(abs(amps - amps2))

再次强调,最大的区别在于1e-16……

Again, the biggest difference is about 1e-16.

DCT 类

The Dct Class

thinkdsp提供了一个类,它以与封装 FFTDct相同的方式封装了 DCT 。要创建一个对象,您可以调用以下方法:SpectrumDctmake_dctWave

thinkdsp provides a Dct class that encapsulates the DCT in the same way the Spectrum class encapsulates the FFT. To make a Dct object, you can invoke make_dct on a Wave:

signal = thinkdsp.TriangleSignal(freq=400)
wave = signal.make_wave(duration=1.0, framerate=10000)
dct = wave.make_dct()
dct.plot()
signal = thinkdsp.TriangleSignal(freq=400)
wave = signal.make_wave(duration=1.0, framerate=10000)
dct = wave.make_dct()
dct.plot()

结果为 400 Hz 三角波的 DCT,如图 6-2所示。DCT 的值可以是正值或负值;DCT 中的负值对应于取反余弦,或者等价地,对应于移位 180 度的余弦。

The result is the DCT of a triangle wave at 400 Hz, shown in Figure 6-2. The values of the DCT can be positive or negative; a negative value in the DCT corresponds to a negated cosine or, equivalently, to a cosine shifted by 180 degrees.

图 6-2. 400 Hz 三角波信号的 DCT,采样率为 10 kHz。

make_dct采用 DCT-II,这是最常见的 DCT 类型,由以下公司提供scipy.fftpack

make_dct uses DCT-II, which is the most common type of DCT, provided by scipy.fftpack:

导入 scipy.fftpack

# Wave 类:
    def make_dct(self):
        N = len(self.ys)
        hs = scipy.fftpack.dct(self.ys, type=2)
        fs = (0.5 + np.arange(N)) / 2
        返回 Dct(hs, fs, self.framerate)
import scipy.fftpack

# class Wave:
    def make_dct(self):
        N = len(self.ys)
        hs = scipy.fftpack.dct(self.ys, type=2)
        fs = (0.5 + np.arange(N)) / 2
        return Dct(hs, fs, self.framerate)

计算结果dct存储在 中hs。按照“DCT-IV”方法计算的相应频率存储在 中fs。然后,两者均用于初始化Dct对象。

The results from dct are stored in hs. The corresponding frequencies, computed as in “DCT-IV”, are stored in fs. And then both are used to initialize the Dct object.

Dct提供make_wave执行逆DCT的函数。我们可以这样测试它:

Dct provides make_wave, which performs the inverse DCT. We can test it like this:

wave2 = dct.make_wave()
max(abs(wave.ys-wave2.ys))
wave2 = dct.make_wave()
max(abs(wave.ys-wave2.ys))

ys1和之间最大的区别ys2约为1e-16,这是我们由于浮点误差而预期的结果。

The biggest difference between ys1 and ys2 is about 1e-16, which is what we expect due to floating-point errors.

make_wave用途scipy.fftpack.idct

make_wave uses scipy.fftpack.idct:

# 类 Dct
    def make_wave(self):
        n = len(self.hs)
        ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n
        返回 Wave(ys, framerate=self.framerate)
# class Dct
    def make_wave(self):
        n = len(self.hs)
        ys = scipy.fftpack.idct(self.hs, type=2) / 2 / n
        return Wave(ys, framerate=self.framerate)

默认情况下,逆 DCT 不会对结果进行归一化,因此我们必须除以2N

By default, the inverse DCT doesn’t normalize the result, so we have to divide through by 2N.

练习

Exercises

以下练习中,我提供了一些初始代码chap06starter.ipynb。答案在chap06soln.ipynb

For the following exercises, I provide some starter code in chap06starter.ipynb. Solutions are in chap06soln.ipynb.

练习 6-1。

本章中我提出,函数 的运行时间与analyze1成正比,函数运行时间与n²成正比。为了验证这一假设,请在不同的输入规模下运行这些函数并计时。在 Jupyter 中,您可以使用“魔法命令” 。analyze2%timeit

In this chapter I claim that analyze1 takes time proportional to n3 and analyze2 takes time proportional to n2. To see if that’s true, run them on a range of input sizes and time them. In Jupyter, you can use the “magic command” %timeit.

如果以双对数坐标绘制运行时间与输入大小的关系图,则对于 ,斜率为 3,对于analyze1,斜率为 2 analyze2

If you plot run time versus input size on a log-log scale, you should get a straight line with slope 3 for analyze1 and slope 2 for analyze2.

您可能还需要测试dct_iv一下scipy.fftpack.dct

You also might want to test dct_iv and scipy.fftpack.dct.

练习 6-2。

DCT 的主要应用之一是对音频和图像进行压缩。最简单的 DCT 压缩工作原理如下:

One of the major applications of the DCT is compression for both sound and images. In its simplest form, DCT-based compression works like this:

  1. 将较长的信号拆分成若干段。

  2. Break a long signal into segments.

  3. 计算每个线段的离散余弦变换(DCT)。

  4. Compute the DCT of each segment.

  5. 识别并移除振幅过低而人耳听不到的频率成分。仅存储剩余的频率和振幅。

  6. Identify frequency components with amplitudes so low they are inaudible, and remove them. Store only the frequencies and amplitudes that remain.

  7. 要回放信号,加载每个片段的频率和幅度,并应用逆DCT。

  8. To play back the signal, load the frequencies and amplitudes for each segment and apply the inverse DCT.

实现该算法的一个版本,并将其应用于音乐或语音录音。在差异变得可感知之前,你能消除多少个成分?

Implement a version of this algorithm and apply it to a recording of music or speech. How many components can you eliminate before the difference is perceptible?

为了使这种方法实用,你需要某种方式来存储稀疏数组;也就是说,一个大部分元素为零的数组。NumPy 提供了几种稀疏数组的实现,你可以在http://docs.scipy.org/doc/scipy/reference/sparse.html上阅读相关内容。

In order to make this method practical, you need some way to store a sparse array; that is, an array where most of the elements are zero. NumPy provides several implementations of sparse arrays, which you can read about at http://docs.scipy.org/doc/scipy/reference/sparse.html.

练习 6-3。

本书的资料库中有一个 Jupyter notebook phase.ipynb,它探讨了相位对声音感知的影响。请通读该 notebook 并运行其中的示例。选择另一段声音,并运行相同的实验。你能发现声音的相位结构与我们感知声音的方式之间存在哪些普遍规律吗?

In the repository for this book you will find a Jupyter notebook called phase.ipynb that explores the effect of phase on sound perception. Read through this notebook and run the examples. Choose another segment of sound and run the same experiments. Can you find any general relationships between the phase structure of a sound and how we perceive it?

第七章离散傅里叶变换

Chapter 7. Discrete Fourier Transform

从第一章开始我们就一直在使用离散傅里叶变换(DFT),但我一直没有解释它的工作原理。现在是时候了。

We’ve been using the Discrete Fourier Transform (DFT) since Chapter 1, but I haven’t explained how it works. Now is the time.

如果你理解了离散余弦变换(DCT),你就能理解离散傅里叶变换(DFT)。唯一的区别在于,DFT 使用的是复指数函数而不是余弦函数。我将首先解释复指数函数,然后我们将按照与第六章相同的步骤进行讲解:

If you understand the Discrete Cosine Transform (DCT), you will understand the DFT. The only difference is that instead of using the cosine function, we’ll use the complex exponential function. I’ll start by explaining complex exponentials, then we’ll follow the same progression as in Chapter 6:

  1. 我们先从合成问题入手:给定一组频率分量及其幅度,我们如何构建一个信号?合成问题等价于逆离散傅里叶变换(IDFT)。

  2. We’ll start with the synthesis problem: given a set of frequency components and their amplitudes, how can we construct a signal? The synthesis problem is equivalent to the inverse DFT.

  3. 然后我们将使用 NumPy 数组将合成问题改写成矩阵乘法的形式。

  4. Then we’ll rewrite the synthesis problem in the form of matrix multiplication using NumPy arrays.

  5. 接下来我们将解决分析问题,这等价于 DFT:给定一个信号,我们如何找到其频率分量的幅度和相位偏移?

  6. Next we’ll solve the analysis problem, which is equivalent to the DFT: given a signal, how do we find the amplitude and phase offset of its frequency components?

  7. 最后,我们将利用线性代数找到一种更有效的计算 DFT 的方法。

  8. Finally, we’ll use linear algebra to find a more efficient way to compute the DFT.

本章的代码chap07.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp07查看它。

The code for this chapter is in chap07.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp07.

复指数

Complex Exponentials

数学中最有趣的操作之一是将一种运算从一种类型推广到另一种类型。例如,阶乘是一个作用于整数的函数;n的阶乘的自然定义是从 1 到n 的所有整数的乘积。

One of the more interesting moves in mathematics is the generalization of an operation from one type to another. For example, a factorial is a function that operates on integers; the natural definition for the factorial of n is the product of all integers from 1 to n.

如果你对计算阶乘感兴趣,你可能会想知道如何计算像 3.5 这样的非整数的阶乘。由于自然定义不适用,你可能会寻找其他方法来计算阶乘函数,即适用于非整数的方法。

If you are of a certain inclination, you might wonder how to compute the factorial of a non-integer like 3.5. Since the natural definition doesn’t apply, you might look for other ways to compute the factorial function, ways that would work with non-integers.

1730 年,莱昂哈德·欧拉发现了一个函数,它是阶乘函数的推广,我们称之为伽玛函数(参见http://en.wikipedia.org/wiki/Gamma_function)。

In 1730, Leonhard Euler found one, a generalization of the factorial function that we know as the gamma function (see http://en.wikipedia.org/wiki/Gamma_function).

欧拉还发现了应用数学中最有用的推广之一——复指数函数。

Euler also found one of the most useful generalizations in applied mathematics, the complex exponential function.

幂运算的自然定义是重复乘法;例如,。但这个定义不适用于非整数指数。

The natural definition of exponentiation is repeated multiplication; for example, . But this definition doesn’t apply to non-integer exponents.

然而,指数运算也可以表示为幂级数:

However, exponentiation can also be expressed as a power series:

这个定义适用于实数、虚数,并且通过简单的推广也适用于复数。将此定义应用于纯虚数,我们得到:

This definition works with real numbers, with imaginary numbers and, by a simple extension, with complex numbers. Applying this definition to a pure imaginary number, , we get:

通过重新排列项,我们可以证明这等价于:

By rearranging terms, we can show that this is equivalent to:

您可以在http://en.wikipedia.org/wiki/Euler's_formula查看推导过程。

You can see the derivation at http://en.wikipedia.org/wiki/Euler’s_formula.

该公式表明e 是一个模为 1 的复数;如果将其视为复平面上的一个点,它始终位于单位圆上。如果将其视为一个向量,则该向量与 x 轴正方向之间的夹角(以弧度为单位)即为辐角ϕ

This formula implies that e is a complex number with magnitude 1; if you think of it as a point in the complex plane, it is always on the unit circle. And if you think of it as a vector, the angle in radians between the vector and the positive x-axis is the argument ϕ.

当指数为复数时,我们有:

In the case where the exponent is a complex number, we have:

其中A是表示振幅的实数,e 是表示角度的单位复数。

where A is a real number that indicates amplitude and e is a unit complex number that indicates angle.

expNumPy 提供了一个可以处理复数的版本:

NumPy provides a version of exp that works with complex numbers:

>>> phi = 1.5 
>>> z = np.exp(1j * phi) 
>>> z 
(0.0707+0.997j)
>>> phi = 1.5
>>> z = np.exp(1j * phi)
>>> z
(0.0707+0.997j)

Python 使用 0j来表示虚数单位,而不是i0。以 0 结尾的数字j被认为是虚数,所以1j0 就是i

Python uses j to represent the imaginary unit, rather than i. A number ending in j is considered imaginary, so 1j is just i.

当参数为np.exp虚数或复数时,结果为复数;具体来说,是一个复数np.complex128,它由两个 64 位浮点数表示。在本例中,结果为0.0707+0.997j

When the argument to np.exp is imaginary or complex, the result is a complex number; specifically, an np.complex128, which is represented by two 64-bit floating-point numbers. In this example, the result is 0.0707+0.997j.

复数具有real以下属性imag

Complex numbers have attributes real and imag:

>>> z.real
0.0707
>>> z.imag 
0.997
>>> z.real
0.0707
>>> z.imag
0.997

要获取幅度,您可以使用内置函数absnp.absolute

To get the magnitude, you can use the built-in function abs or np.absolute:

>>> abs(z)
1.0
>>> np.absolute(z) 
1.0
>>> abs(z)
1.0
>>> np.absolute(z)
1.0

要获取角度,您可以使用np.angle

To get the angle, you can use np.angle:

>>> np.angle(z) 
1.5
>>> np.angle(z)
1.5

这个例子证实了是一个复数,其大小为 1,角度为ϕ弧度。

This example confirms that is a complex number with magnitude 1 and angle ϕ radians.

复杂信号

Complex Signals

如果是时间的函数,那么 也是时间的函数。具体来说:

If is a function of time, is also a function of time. Specifically:

该函数描述的是一个随时间变化的量,因此它是一个信号。具体来说,它是一个复指数信号

This function describes a quantity that varies in time, so it is a signal. Specifically, it is a complex exponential signal.

在信号频率恒定的特殊情况下,结果是正弦波

In the special case where the frequency of the signal is constant, is and the result is a complex sinusoid:

更一般地,信号可能从相位偏移ϕ 0开始,从而产生:

Or more generally, the signal might start at a phase offset ϕ0, yielding:

thinkdsp提供此信号的实现ComplexSinusoid

thinkdsp provides an implementation of this signal, ComplexSinusoid:

class ComplexSinusoid(Sinusoid):
 
   def evaluate(self, ts):
        相位 = PI2 * 自我频率 * 时间 + 自我偏移
        ys = self.amp * np.exp(1j * phases)
        返回 ys
class ComplexSinusoid(Sinusoid):
 
   def evaluate(self, ts):
        phases = PI2 * self.freq * ts + self.offset
        ys = self.amp * np.exp(1j * phases)
        return ys

ComplexSinusoid继承__init__自。它提供了一个几乎与相同的Sinusoid版本;唯一的区别是它使用而不是。evaluateSinusoid.evaluatenp.expnp.sin

ComplexSinusoid inherits __init__ from Sinusoid. It provides a version of evaluate that is almost identical to Sinusoid.evaluate; the only difference is that it uses np.exp instead of np.sin.

结果是一个包含复数的 NumPy 数组:

The result is a NumPy array of complex numbers:

>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1) 
>>> wave = signal.make_wave(duration=1, framerate=4) 
>>> wave.ys 
[ 0.324+0.505j -0.505+0.324j -0.324-0.505j 0.505-0.324j]
>>> signal = thinkdsp.ComplexSinusoid(freq=1, amp=0.6, offset=1)
>>> wave = signal.make_wave(duration=1, framerate=4)
>>> wave.ys
[ 0.324+0.505j -0.505+0.324j -0.324-0.505j  0.505-0.324j]

该信号的频率为每秒 1 个周期;幅度为 0.6(单位未指定);相位偏移为 1 弧度。

The frequency of this signal is 1 cycle per second; the amplitude is 0.6 (in unspecified units); and the phase offset is 1 radian.

此示例在 0 到 1 秒之间等距选取四个位置对信号进行评估。所得样本为复数。

This example evaluates the signal at four places equally spaced between 0 and 1 second. The resulting samples are complex numbers.

合成问题

The Synthesis Problem

就像我们处理真实正弦波一样,我们可以通过将不同频率的复正弦波相加来创建复合信号。这就引出了复合合成问题的复数版本:给定每个复分量的频率和幅度,我们如何评估信号?

Just as we did with real sinusoids, we can create compound signals by adding up complex sinusoids with different frequencies. And that brings us to the complex version of the synthesis problem: given the frequency and amplitude of each complex component, how do we evaluate the signal?

最简单的解决方法是创建ComplexSinusoid对象并将它们相加:

The simplest solution is to create ComplexSinusoid objects and add them up:

def synthesize1(amps, fs, ts):
    components = [thinkdsp.ComplexSinusoid(freq, amp)
                  对于 zip(amps, fs) 中的每个 amp 和 freq]
    signal = thinkdsp.SumSignal(*components)
    ys = signal.evaluate(ts)
    返回 ys
def synthesize1(amps, fs, ts):
    components = [thinkdsp.ComplexSinusoid(freq, amp)
                  for amp, freq in zip(amps, fs)]
    signal = thinkdsp.SumSignal(*components)
    ys = signal.evaluate(ts)
    return ys

这个函数几乎与“Synthesis”synthesize1中的函数完全相同;唯一的区别是我用替换了。CosSignalComplexSinusoid

This function is almost identical to synthesize1 in “Synthesis”; the only difference is that I replaced CosSignal with ComplexSinusoid.

举个例子:

Here’s an example:

amps = np.array([0.6, 0.25, 0.1, 0.05])
fs = [100, 200, 300, 400]
帧率 = 11025
ts = np.linspace(0, 1, framerate)
ys = synthesize1(amps, fs, ts)
amps = np.array([0.6, 0.25, 0.1, 0.05])
fs = [100, 200, 300, 400]
framerate = 11025
ts = np.linspace(0, 1, framerate)
ys = synthesize1(amps, fs, ts)

结果是:

The result is:

[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,
  0.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]
[ 1.000 +0.000e+00j  0.995 +9.093e-02j  0.979 +1.803e-01j ...,
  0.979 -1.803e-01j  0.995 -9.093e-02j  1.000 -5.081e-15j]

从最基本的层面来说,复杂信号是一串复数。但我们应该如何解释它呢?我们对实际信号有一些直觉:它们代表随时间变化的量;例如,声音信号代表气压的变化。但我们在现实世界中测量的任何事物都不会产生复数。

At the lowest level, a complex signal is a sequence of complex numbers. But how should we interpret it? We have some intuition for real signals: they represent quantities that vary in time; for example, a sound signal represents changes in air pressure. But nothing we measure in the world yields complex numbers.

那么,什么是复杂信号?我没有令人满意的答案。我能提供的最好答案是两个不太令人满意的答案:

So what is a complex signal? I don’t have a satisfying answer to this question. The best I can offer is two unsatisfying answers:

  1. 复杂信号是一种数学抽象概念,对计算和分析很有用,但它并不直接对应于现实世界中的任何事物。

  2. A complex signal is a mathematical abstraction that is useful for computation and analysis, but it does not correspond directly with anything in the real world.

  3. 如果你愿意,你可以把复杂信号想象成一个复数序列,其中包含两个信号作为其实部和虚部。

  4. If you like, you can think of a complex signal as a sequence of complex numbers that contains two signals as its real and imaginary parts.

从第二个角度来看,我们可以将前面的信号分解为实部和虚部:

Taking the second point of view, we can split the previous signal into its real and imaginary parts:

n = 500
thinkplot.plot(ts[:n], ys[:n].real, label='real')
thinkplot.plot(ts[:n], ys[:n].imag, label='imag')
n = 500
thinkplot.plot(ts[:n], ys[:n].real, label='real')
thinkplot.plot(ts[:n], ys[:n].imag, label='imag')

图 7-1显示了结果的一部分。实部是余弦波之和;虚部是正弦波之和。虽然波形看起来不同,但它们包含相同比例的相同频率分量。在我们听来,它们听起来是一样的(通常情况下,我们听不出相位偏移)。

Figure 7-1 shows a segment of the result. The real part is a sum of cosines; the imaginary part is a sum of sines. Although the waveforms look different, they contain the same frequency components in the same proportions. To our ears, they sound the same (in general, we don’t hear phase offsets).

图 7-1.复杂正弦波混合物的实部和虚部。

基质合成

Synthesis with Matrices

正如我们在“使用数组进行综合”一节中所看到的,我们也可以用矩阵乘法来表示综合问题:

As we saw in “Synthesis with Arrays”, we can also express the synthesis problem in terms of matrix multiplication:

PI2 = np.pi * 2

def synthesize2(amps, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    ys = np.dot(M, amps)
    返回 ys
PI2 = np.pi * 2

def synthesize2(amps, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    ys = np.dot(M, amps)
    return ys

再次强调,amps这是一个包含一系列振幅的 NumPy 数组。

Again, amps is a NumPy array that contains a sequence of amplitudes.

fs是一个包含各分量频率的序列。ts包含我们将评估信号的时间点。

fs is a sequence containing the frequencies of the components. ts contains the times where we will evaluate the signal.

argsts包含和 的外积fs,其中 沿ts行向下,fs沿列向上(您可能需要回顾图 6-1)。

args contains the outer product of ts and fs, with the ts running down the rows and the fs running across the columns (you might want to refer back to Figure 6-1).

矩阵的每一列M都包含一个具有特定频率的复正弦波,该复正弦波在一系列处进行评估ts

Each column of matrix M contains a complex sinusoid with a particular frequency, evaluated at a sequence of ts.

当我们乘以M振幅时,结果是一个向量,其元素对应于ts;每个元素都是在特定时间计算的几个复正弦波之和。

When we multiply M by the amplitudes, the result is a vector whose elements correspond to the ts; each element is the sum of several complex sinusoids, evaluated at a particular time.

以下是上一节中的例子:

Here’s the example from the previous section again:

>>> ys = synthesize2(amps, fs, ts) 
>>> ys
[ 1.000 +0.000e+00j 0.995 +9.093e-02j 0.979 +1.803e-01j ...,
  0.979 -1.803e-01j 0.995 -9.093e-02j 1.000 -5.081e-15j]
>>> ys = synthesize2(amps, fs, ts)
>>> ys
[ 1.000 +0.000e+00j  0.995 +9.093e-02j  0.979 +1.803e-01j ...,
  0.979 -1.803e-01j  0.995 -9.093e-02j  1.000 -5.081e-15j]

结果是一样的。

The result is the same.

在这个例子中,振幅是实数,但也可以是复数。复振幅会对结果产生什么影响?记住,我们可以用两种方式理解复数:实部和虚部之和,或者实振幅与复指数的乘积。使用第二种解释,我们可以看到当复振幅乘以复正弦波时会发生什么。对于每个频率f,我们有:

In this example the amplitudes are real, but they could also be complex. What effect does a complex amplitude have on the result? Remember that we can think of a complex number in two ways: either the sum of a real and an imaginary part, , or the product of a real amplitude and a complex exponential, . Using the second interpretation, we can see what happens when we multiply a complex amplitude by a complex sinusoid. For each frequency, f, we have:

乘以A将振幅乘以A,并加上相位偏移ϕ 0

Multiplying by multiplies the amplitude by A and adds the phase offset ϕ0.

我们可以通过运行前面的例子,并使用复振幅来验证这个说法:

We can test that claim by running the previous example with complex amplitudes:

φ = 1.5
amps2 = amps * np.exp(1j * phi)
ys2 = synthesize2(amps2, fs, ts)

thinkplot.plot(ts[:n], ys.real[:n])
thinkplot.plot(ts[:n], ys2.real[:n])
phi = 1.5
amps2 = amps * np.exp(1j * phi)
ys2 = synthesize2(amps2, fs, ts)

thinkplot.plot(ts[:n], ys.real[:n])
thinkplot.plot(ts[:n], ys2.real[:n])

由于amps是一个实数数组,乘以 会np.exp(1j * phi)产生一个复数数组,其相位偏移phi弧度为 ,幅度与 相同amps

Since amps is an array of reals, multiplying by np.exp(1j * phi) yields an array of complex numbers with phase offset phi radians, and the same magnitudes as amps.

图 7-2显示了具有不同相位偏移的波形。每个频率分量都偏移了大约四分之一周期。但不同频率的分量具有不同的周期;因此,每个分量在时间上的偏移量也不同。当我们把这些分量叠加起来时,得到的波形看起来就不一样了。

Figure 7-2 shows waveforms with different phase offsets. With each frequency component gets shifted by about a quarter of a cycle. But components with different frequencies have different cycles; as a result, each component is shifted by a different amount in time. When we add up the components, the resulting waveforms look different.

现在我们有了更通用的合成问题解决方案——一个可以处理复杂振幅的解决方案——我们就可以着手解决分析问题了。

Now that we have the more general solution to the synthesis problem—one that handles complex amplitudes—we are ready for the analysis problem.

图 7-2.两个相差一个相位偏移的复信号的实部。

分析问题

The Analysis Problem

分析问题是合成问题的逆问题:给定一个样本序列y,并且知道构成信号的频率,我们能否计算出分量的复振幅a

The analysis problem is the inverse of the synthesis problem: given a sequence of samples, y, and knowing the frequencies that make up the signal, can we compute the complex amplitudes of the components, a?

正如我们在“分析”部分所看到的,我们可以通过构造综合矩阵M并求解线性方程组 来解决这个问题,其中a为参数:

As we saw in “Analysis”, we can solve this problem by forming the synthesis matrix, M, and solving the system of linear equations, , for a:

def analyze1(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    amps = np.linalg.solve(M, ys)
    回流放大器
def analyze1(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    amps = np.linalg.solve(M, ys)
    return amps

analyze1它接受一个(可能复杂的)波数组ys,一个实频率序列,fs以及一个实时间序列ts。它返回一个复振幅序列amps

analyze1 takes a (possibly complex) wave array, ys, a sequence of real frequencies, fs, and a sequence of real times, ts. It returns a sequence of complex amplitudes, amps.

继续之前的例子,我们可以确认它analyze1恢复了我们最初的振幅。为了使线性系统求解器正常工作,M必须是正方形,所以我们需要ysfsts具有相同的长度。我将通过将ysts切片到 的长度来确保这一点fs

Continuing the previous example, we can confirm that analyze1 recovers the amplitudes we started with. For the linear system solver to work, M has to be square, so we need ys, fs and ts, to have the same length. I’ll ensure that by slicing ys and ts down to the length of fs:

>>> n = len(fs) 
>>> amps2 = analyze1(ys[:n], fs, ts[:n]) 
>>> amps2 
[ 0.60+0.j 0.25-0.j 0.10+0.j 0.05-0.j]
>>> n = len(fs)
>>> amps2 = analyze1(ys[:n], fs, ts[:n])
>>> amps2
[ 0.60+0.j  0.25-0.j  0.10+0.j  0.05-0.j]

这些与我们最初得到的振幅大致相同,尽管由于浮点误差,每个分量都有一个很小的虚部。

These are approximately the amplitudes we started with, although each component has a small imaginary part due to floating-point errors.

高效分析

Efficient Analysis

遗憾的是,求解线性方程组速度很慢。对于离散余弦变换(DCT),我们通过选择合适的矩阵使得矩阵正交来加快计算速度。fs这样,矩阵的逆矩阵就是矩阵的转置矩阵,我们就可以通过矩阵乘法来计算DCT及其逆矩阵。tsMMM

Unfortunately, solving a linear system of equations is slow. For the DCT, we were able to speed things up by choosing fs and ts so that M is orthogonal. That way, the inverse of M is the transpose of M, and we can compute both the DCT and inverse DCT by matrix multiplication.

我们对 DFT 也做同样的处理,但有一个小小的改动。由于M是复数,我们需要它是酉矩阵而不是正交矩阵,这意味着 的逆矩阵M是 的共轭转置矩阵M,我们可以通过转置矩阵并对每个元素的虚部取反来计算共轭转置矩阵。参见http://en.wikipedia.org/wiki/Unitary_matrix

We’ll do the same thing for the DFT, with one small change. Since M is complex, we need it to be unitary, rather than orthogonal, which means that the inverse of M is the conjugate transpose of M, which we can compute by transposing the matrix and negating the imaginary part of each element. See http://en.wikipedia.org/wiki/Unitary_matrix.

这些 NumPy 方法conj可以实现我们想要的功能。以下是计算分量的transpose代码:M

The NumPy methods conj and transpose do what we want. Here’s the code that computes M for components:

N = 4
ts = np.arange(N) / N
fs = np.arange(N)
args = np.outer(ts, fs)
M = np.exp(1j * PI2 * args)
N = 4
ts = np.arange(N) / N
fs = np.arange(N)
args = np.outer(ts, fs)
M = np.exp(1j * PI2 * args)

如果M是酉矩阵,则,其中是M的共轭转置,I是单位矩阵。我们可以这样检验M是否为酉矩阵:

If M is unitary, , where is the conjugate transpose of M, and I is the identity matrix. We can test whether M is unitary like this:

MstarM = M.conj().transpose().dot(M)
MstarM = M.conj().transpose().dot(M)

在浮点误差容限内,结果是,因此M除了一个额外的因子N之外是单位矩阵,类似于我们在 DCT 中发现的额外因子 2。

The result, within the tolerance of floating-point error, is , so M is unitary except for an extra factor of N, similar to the extra factor of 2 we found with the DCT.

我们可以利用这个结果编写一个速度更快的版本analyze1

We can use this result to write a faster version of analyze1:

def analyze2(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    amps = M.conj().transpose().dot(ys) / N
    回流放大器
def analyze2(ys, fs, ts):
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    amps = M.conj().transpose().dot(ys) / N
    return amps

fs并使用适当的和值进行测试ts

And test it with appropriate values of fs and ts:

N = 4
amps = np.array([0.6, 0.25, 0.1, 0.05])
fs = np.arange(N)
ts = np.arange(N) / N
ys = synthesize2(amps, fs, ts)
amps3 = analyze2(ys, fs, ts)
N = 4
amps = np.array([0.6, 0.25, 0.1, 0.05])
fs = np.arange(N)
ts = np.arange(N) / N
ys = synthesize2(amps, fs, ts)
amps3 = analyze2(ys, fs, ts)

同样,结果在浮点运算的误差范围内是正确的:

Again, the result is correct within the tolerance of floating-point arithmetic:

[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]
[ 0.60+0.j  0.25+0.j  0.10-0.j  0.05-0.j]

密度泛函理论

DFT

作为函数,analyze2它很难使用,因为它只有在正确选择 `x`fs和 `y`ts参数时才能正常工作。因此,我将重写它,使其只接受 `x`ys和 `y` 参数,并计算freq它们ts自身。

As a function, analyze2 would be hard to use because it only works if fs and ts are chosen correctly. Instead, I will rewrite it to take just ys and compute freq and ts itself.

首先,我将编写一个函数来计算综合矩阵M

First, I’ll make a function to compute the synthesis matrix, M:

def synthesis_matrix(N):
    ts = np.arange(N) / N
    fs = np.arange(N)
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    返回 M
def synthesis_matrix(N):
    ts = np.arange(N) / N
    fs = np.arange(N)
    args = np.outer(ts, fs)
    M = np.exp(1j * PI2 * args)
    return M

ys然后我将编写一个接受参数并返回结果的函数amps

Then I’ll write the function that takes ys and returns amps:

def analyze3(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.conj().transpose().dot(ys) / N
    回流放大器
def analyze3(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.conj().transpose().dot(ys) / N
    return amps

我们快完成了;analyze3计算结果与DFT非常接近,只有一个区别。DFT的传统定义不进行除法运算N

We are almost done; analyze3 computes something very close to the DFT, with one difference. The conventional definition of the DFT does not divide by N:

def dft(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.conj().transpose().dot(ys)
    回流放大器
def dft(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.conj().transpose().dot(ys)
    return amps

现在我们可以确认,我的版本与以下版本的结果相同np.fft.fft

Now we can confirm that my version yields the same result as np.fft.fft:

>>> dft(ys) 
[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]
>>> dft(ys)
[ 2.4+0.j  1.0+0.j  0.4-0.j  0.2-0.j]

结果接近于amps * N。以下是版本np.fft

The result is close to amps * N. And here’s the version in np.fft:

>>> np.fft.fft(ys) 
[ 2.4+0.j 1.0+0.j 0.4-0.j 0.2-0.j]
>>> np.fft.fft(ys)
[ 2.4+0.j  1.0+0.j  0.4-0.j  0.2-0.j]

在浮点误差范围内,它们是相同的。

They are the same, within floating-point error.

逆DFT几乎相同,只是我们不需要转置和共轭M现在我们需要除以N

The inverse DFT is almost the same, except we don’t have to transpose and conjugate M, and now we have to divide through by N:

def idft(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    安培数 = M.dot(ys) / N
    回流放大器
def idft(ys):
    N = len(ys)
    M = synthesis_matrix(N)
    amps = M.dot(ys) / N
    return amps

最后,我们可以确认dft(idft(amps))产量为amps

Finally, we can confirm that dft(idft(amps)) yields amps:

>>> ys = idft(amps) 
>>> dft(ys) 
[ 0.60+0.j 0.25+0.j 0.10-0.j 0.05-0.j]
>>> ys = idft(amps)
>>> dft(ys)
[ 0.60+0.j  0.25+0.j  0.10-0.j  0.05-0.j]

如果可以回到过去,我可能会修改DFT的定义,使其除以N,而逆DFT则不除以N。这样会更符合我对合成和分析问题的阐述。

If I could go back in time, I might change the definition of the DFT so it divides by N and the inverse DFT doesn’t. That would be more consistent with my presentation of the synthesis and analysis problems.

或者我可以修改定义,使两个运算都除以。这样,DFT 和逆 DFT 就会更加对称。

Or I might change the definition so that both operations divide through by . Then the DFT and inverse DFT would be more symmetric.

但我现在还回不去过去,所以我们只能接受这种略显怪异的惯例。不过实际上,这倒无关紧要。

But I can’t go back in time (yet!), so we’re stuck with a slightly weird convention. For practical purposes it doesn’t really matter.

密度泛函理论是周期性的。

The DFT Is Periodic

本章中,我以矩阵乘法的形式介绍了DFT。我们计算合成矩阵M和分析矩阵。当乘以波阵列y时,结果的每个元素都是My的某一行的乘积,我们可以将其写成求和的形式:

In this chapter I presented the DFT in the form of matrix multiplication. We compute the synthesis matrix, M, and the analysis matrix, . When we multiply by the wave array, y, each element of the result is the product of a row from and y, which we can write in the form of a summation:

其中k是频率的索引,范围从 0 到n是时间的索引,范围从 0 到。因此,y的 DFT 的第k个元素。

where k is an index of frequency from 0 to and n is an index of time from 0 to . So is the kth element of the DFT of y.

通常情况下,我们会对N 个k值(从 0 到n)计算这个求和式。我们也可以计算其他k值,但这没有意义,因为结果会重复。也就是说,k处的值与 k = 1或k = 2处的值相同,等等。

Normally we evaluate this summation for N values of k, running from 0 to . We could evaluate it for other values of k, but there is no point, because they start to repeat. That is, the value at k is the same as the value at or or , etc.

我们可以通过代入求和式来从数学上看出这一点:

We can see that mathematically by plugging into the summation:

由于指数部分包含求和,我们可以将其拆分为两部分:

Since there is a sum in the exponent, we can break it into two parts:

在第二项中,指数始终是 的整数倍,因此结果始终为 1,我们可以将其省略:

In the second term, the exponent is always an integer multiple of , so the result is always 1, and we can drop it:

我们可以看到,这个求和等价于。因此,DFT 是周期性的,周期为N。本章末尾的练习题要求你实现快速傅里叶变换 (FFT),你会用到这个结果。

And we can see that this summation is equivalent to . So the DFT is periodic, with period N. You will need this result for one of the exercises at the end of this chapter, which asks you to implement the Fast Fourier Transform (FFT).

顺便一提,将DFT写成求和的形式有助于理解它的工作原理。如果你回顾一下“使用数组进行合成”中的图表,你会发现合成矩阵的每一列都是一个信号在一系列时间点上的值。分析矩阵是合成矩阵的(共轭)转置,因此每一都是一个信号在一系列时间点上的值。

As an aside, writing the DFT in the form of a summation provides an insight into how it works. If you review the diagram in “Synthesis with Arrays”, you’ll see that each column of the synthesis matrix is a signal evaluated at a sequence of times. The analysis matrix is the (conjugate) transpose of the synthesis matrix, so each row is a signal evaluated at a sequence of times.

因此,每个求和项都是y与阵列中某个信号的相关性(参见“相关性作为点积” )。也就是说,DFT 的每个元素都是一个相关性,它量化了波阵列y与特定频率下的复指数函数的相似性。

Therefore, each summation is the correlation of y with one of the signals in the array (see “Correlation as Dot Product”). That is, each element of the DFT is a correlation that quantifies the similarity of the wave array, y, and a complex exponential at a particular frequency.

实信号的离散傅里叶变换

DFT of Real Signals

Spectrum本章中的类thinkdsp基于计算“实数DFT”的函数np.ftt.rfft,也就是说,它处理的是实数信号。但本章介绍的DFT更具通用性,它适用于复数信号。

The Spectrum class in thinkdsp is based on np.ftt.rfft, which computes the “real DFT”; that is, it works with real signals. But the DFT as presented in this chapter is more general than that; it works with complex signals.

那么,当我们对实际信号应用“完整DFT”时会发生什么呢?让我们来看一个例子

So what happens when we apply the “full DFT” to a real signal? Let’s look at an example:

signal = thinkdsp.SawtoothSignal(freq=500)
wave = signal.make_wave(duration=0.1, framerate=10000)
hs = dft(wave.ys)
amps = np.absolute(hs)
signal = thinkdsp.SawtoothSignal(freq=500)
wave = signal.make_wave(duration=0.1, framerate=10000)
hs = dft(wave.ys)
amps = np.absolute(hs)

这段代码生成一个频率为 500 Hz、采样率为 10 kHz 的锯齿波。hs它包含了该波的复数离散傅里叶变换 (DFT);amps包含了每个频率处的振幅。但是这些振幅分别对应于哪个频率呢?如果我们观察函数体dft,会发现:

This code makes a sawtooth wave with frequency 500 Hz, sampled at frame rate 10 kHz. hs contains the complex DFT of the wave; amps contains the amplitude at each frequency. But what frequency do these amplitudes correspond to? If we look at the body of dft, we see:

fs = np.arange(N)
fs = np.arange(N)

人们很容易认为这些值就是正确的频率。问题在于它dft不知道采样率。DFT 假设波形的持续时间为 1 个时间单位,因此它认为采样率为每时间单位N 次。为了解释这些频率,我们必须将这些任意的时间单位转换回秒,如下所示:

It’s tempting to think that these values are the right frequencies. The problem is that dft doesn’t know the sampling rate. The DFT assumes that the duration of the wave is 1 time unit, so it thinks the sampling rate is N per time unit. In order to interpret the frequencies, we have to convert from these arbitrary time units back to seconds, like this:

fs = np.arange(N) * 帧率​​ / N
fs = np.arange(N) * framerate / N

经过这一更改,频率范围从 0 到实际帧速率 10 kHz。现在我们可以绘制频谱图了:

With this change, the range of frequencies is from 0 to the actual frame rate, 10 kHz. Now we can plot the spectrum:

thinkplot.plot(fs, amps)
thinkplot.config(xlabel='频率 (Hz)',
                 ylabel='振幅')
thinkplot.plot(fs, amps)
thinkplot.config(xlabel='frequency (Hz)', 
                 ylabel='amplitude')

图 7-3显示了 0 到 10 kHz 范围内每个频率分量的信号幅度。图的左半部分符合我们的预期:主频率为 500 Hz,谐波衰减如下

Figure 7-3 shows the amplitude of the signal for each frequency component from 0 to 10 kHz. The left half of the figure is what we should expect: the dominant frequency is at 500 Hz, with harmonics dropping off like .

图 7-3.采样率为 10 kHz 的 500 Hz 锯齿波信号的 DFT。

但图中右半部分却令人惊讶。超过 5000 赫兹后,谐波的振幅再次开始增大,并在 9500 赫兹处达到峰值。这是怎么回事?

But the right half of the figure is a surprise. Past 5000 Hz, the amplitude of the harmonics starts growing again, peaking at 9500 Hz. What’s going on?

答案是:混叠。请记住,帧速率为 10,000 Hz 时,折叠频率为 5000 Hz。正如我们在“混叠”一节中看到的,5500 Hz 的分量与 4500 Hz 的分量无法区分。当我们计算 5500 Hz 处的 DFT 时,会得到与 4500 Hz 处相同的值。类似地,6000 Hz 处的值与 4000 Hz 处的值相同,依此类推。

The answer: aliasing. Remember that with frame rate 10,000 Hz, the folding frequency is 5000 Hz. As we saw in “Aliasing”, a component at 5500 Hz is indistinguishable from a component at 4500 Hz. When we evaluate the DFT at 5500 Hz, we get the same value as at 4500 Hz. Similarly, the value at 6000 Hz is the same as the one at 4000 Hz, and so on.

实际信号的离散傅里叶变换 (DFT) 关于折叠频率对称。由于超过折叠频率后没有额外信息,我们可以只计算 DFT 的前半部分,从而节省时间,而这正是我们np.fft.rfft所做的。

The DFT of a real signal is symmetric around the folding frequency. Since there is no additional information past this point, we can save time by evaluating only the first half of the DFT, and that’s exactly what np.fft.rfft does.

练习

Exercises

这些练习的答案在chap07soln.ipynb……

Solutions to these exercises are in chap07soln.ipynb.

练习 7-1。

本章的笔记本中chap07.ipynb包含更多示例和解释。请阅读并运行代码。

The notebook for this chapter, chap07.ipynb, contains additional examples and explanations. Read through it and run the code.

练习 7-2。

在本章中,我展示了如何将离散傅里叶变换 (DFT) 和逆离散傅里叶变换 (IFDFT) 表示为矩阵乘法。这些运算的时间与 N² 成正比其中N波阵列的长度。这对于许多应用来说已经足够快了,但还有一种更快的算法,即快速傅里叶变换 (FFT),其时间与 N² 成正比

In this chapter, I showed how we can express the DFT and inverse DFT as matrix multiplications. These operations take time proportional to N2, where N is the length of the wave array. That is fast enough for many applications, but there is a faster algorithm, the Fast Fourier Transform (FFT), which takes time proportional to .

快速傅里叶变换的关键在于丹尼尔森-兰佐斯引理:

The key to the FFT is the Danielson–Lanczos lemma:

其中,是y的 DFT 的第n个元素,e是包含y的偶数元素的波阵列,o包含y的奇数元素。

where is the nth element of the DFT of y, e is a wave array containing the even elements of y, and o contains the odd elements of y.

该引理提示了一种求解DFT的递归算法:

This lemma suggests a recursive algorithm for the DFT:

  1. 给定一个波阵列y,将其拆分为偶数元素e和奇数元素o

  2. Given a wave array, y, split it into its even elements, e, and its odd elements, o.

  3. 通过递归调用计算eo的 DFT 。

  4. Compute the DFT of e and o by making recursive calls.

  5. 使用 Danielson-Lanczos 引理计算每个n值对应的结果。

  6. Compute for each value of n using the Danielson–Lanczos lemma.

对于此递归的基本情况,您可以等到y的长度为 1。在这种情况下, 。或者,如果y的长度足够小,您可以通过矩阵乘法计算其 DFT,可能使用预先计算的矩阵。

For the base case of this recursion, you could wait until the length of y is 1. In that case, . Or if the length of y is sufficiently small, you could compute its DFT by matrix multiplication, possibly using a precomputed matrix.

提示:我建议您逐步实现此算法,首先从一个非真正递归的版本开始。在步骤 2 中,不要进行递归调用,而是使用“DFT”dft中定义的 `______`或 `______ `。确保步骤 3 正常运行,并确认结果与其他实现一致。然后添加一个基本情况并确认其运行正常。最后,将步骤 2 替换为递归调用。np.fft.fft

Hint: I suggest you implement this algorithm incrementally by starting with a version that is not truly recursive. In Step 2, instead of making a recursive call, use dft, as defined in “DFT”, or np.fft.fft. Get Step 3 working, and confirm that the results are consistent with the other implementations. Then add a base case and confirm that it works. Finally, replace Step 2 with recursive calls.

还有一个提示:记住 DFT 是周期性的;你可能会发现np.tile它很有用。

One more hint: remember that the DFT is periodic; you might find np.tile useful.

您可以访问https://en.wikipedia.org/wiki/Fast_Fourier_transform了解更多关于 FFT 的信息。

You can read more about the FFT at https://en.wikipedia.org/wiki/Fast_Fourier_transform.

第八章滤波与卷积

Chapter 8. Filtering and Convolution

本章我将介绍信号处理中最重要、最有用的概念之一:卷积定理。但在理解卷积定理之前,我们必须先理解卷积的概念。我将从一个简单的例子——平滑——开始,然后逐步深入。

In this chapter I present one of the most important and useful ideas related to signal processing: the Convolution Theorem. But before we can understand the Convolution Theorem, we have to understand convolution. I’ll start with a simple example, smoothing, and we’ll go from there.

本章的代码chap08.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp08查看它。

The code for this chapter is in chap08.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp08.

平滑

Smoothing

平滑处理是一种旨在消除信号中短期波动以揭示长期趋势的操作。例如,如果您绘制股票价格的每日变化图,图表会显得杂乱无章;平滑处理可以使价格随时间推移的总体趋势(上涨或下跌)更加清晰。

Smoothing is an operation that tries to remove short-term variations from a signal in order to reveal long-term trends. For example, if you were to plot daily changes in the price of a stock, it would look noisy; a smoothing operator might make it easier to see whether the price was generally going up or down over time.

常见的平滑算法是移动平均,它计算前n 个值的平均值,其中n为某个值。

A common smoothing algorithm is a moving average, which computes the mean of the previous n values, for some value of n.

例如,图 8-1显示了 Facebook 股票从 2012 年 5 月 17 日至 2015 年 12 月 8 日的每日收盘价。灰色线代表原始数据,深色线代表 30 天移动平均线。平滑处理可以消除极端波动,使长期趋势更容易显现。

For example, Figure 8-1 shows the daily closing price of Facebook stock from May 17, 2012 to December 8, 2015. The gray line is the raw data, and the darker line shows the 30-day moving average. Smoothing removes the most extreme changes and makes it easier to see long-term trends.

图 8-1. Facebook 股票的每日收盘价和 30 天移动平均线。

平滑操作也适用于音频信号。例如,我将从一个 440 Hz 的方波开始。正如我们在“方波”一节中看到的,方波的谐波衰减缓慢,因此它包含许多高频分量。

Smoothing operations also apply to sound signals. As an example, I’ll start with a square wave at 440 Hz. As we saw in “Square Waves”, the harmonics of a square wave drop off slowly, so it contains many high-frequency components.

首先,我将构建信号和两个波形:

First I’ll construct the signal and two waves:

signal = thinkdsp.SquareSignal(freq=440)
wave = signal.make_wave(duration=1, framerate=44100)
segment = wave.segment(duration=0.01)
signal = thinkdsp.SquareSignal(freq=440)
wave = signal.make_wave(duration=1, framerate=44100)
segment = wave.segment(duration=0.01)

wave是信号的一秒钟切片;segment是我将用于绘图的较短切片。

wave is a one-second slice of the signal; segment is a shorter slice I’ll use for plotting.

为了计算该信号的移动平均值,我将使用类似于“加窗”一节中介绍的窗口。之前我们使用汉明窗来避免信号开头和结尾处的不连续性导致的频谱泄漏。更一般地,我们可以使用窗口来计算波形中样本的加权和。

To compute the moving average of this signal, I’ll use a window similar to the ones in “Windowing”. Previously we used a Hamming window to avoid spectral leakage caused by discontinuity at the beginning and end of a signal. More generally, we can use windows to compute the weighted sum of samples in a wave.

例如,要计算移动平均值,我将创建一个包含 11 个元素的窗口,并对其进行归一化,使这些元素的总和为 1:

For example, to compute a moving average, I’ll create a window with 11 elements and normalize it so the elements add up to 1:

window = np.ones(11)
window /= sum(window)
window = np.ones(11)
window /= sum(window)

现在我可以通过将窗口大小乘以波形数组来计算前 11 个元素的平均值:

Now I can compute the average of the first 11 elements by multiplying the window by the wave array:

ys = segment.ys
N = len(ys)
padded = thinkdsp.zero_pad(window, N)
prod = padded * ys
sum(prod)
ys = segment.ys
N = len(ys)
padded = thinkdsp.zero_pad(window, N)
prod = padded * ys
sum(prod)

padded是在窗口末尾添加零使其长度与 相同的版本segment.ys。像这样添加零称为填充

padded is a version of the window with zeros added to the end so it is the same length as segment.ys. Adding zeros like this is called padding.

prod是窗口和波形阵列的乘积。各元素乘积之和等于阵列前 11 个元素的平均值。由于这些元素均为 -1,因此它们的平均值为 -1。

prod is the product of the window and the wave array. The sum of the elementwise products is the average of the first 11 elements of the array. Since these elements are all –1, their average is –1.

为了计算移动平均值的下一个元素,我们滚动窗口,将 1 向右移动,并将末尾的一个 0 循环到开头。

To compute the next element of the moving average, we roll the window, which shifts the ones to the right and wraps one of the zeros from the end around to the beginning.

当我们把滚动窗口和波阵面相乘时,我们得到波阵面接下来 11 个元素的平均值,从第二个元素开始:

When we multiply the rolled window and the wave array, we get the average of the next 11 elements of the wave array, starting with the second:

rolling = np.roll(rolled, 1)
产品 = 滚动 * ys
sum(prod)
rolled = np.roll(rolled, 1)
prod = rolled * ys
sum(prod)

结果仍然是-1。

The result is –1 again.

我们可以用同样的方法计算其余元素。下面的函数将我们目前看到的代码封装在一个循环中,并将结果存储在一个数组中:

We can compute the rest of the elements the same way. The following function wraps the code we have seen so far in a loop and stores the results in an array:

def smooth(ys, window):
    N = len(ys)
    smoothed = np.zeros(N)
    padded = thinkdsp.zero_pad(window, N)
    卷边 = 填充

    for i in range(N):
        smoothed[i] = sum(rolled * ys)
        rolling = np.roll(rolled, 1)
    返回平滑
def smooth(ys, window):
    N = len(ys)
    smoothed = np.zeros(N)
    padded = thinkdsp.zero_pad(window, N)
    rolled = padded

    for i in range(N):
        smoothed[i] = sum(rolled * ys)
        rolled = np.roll(rolled, 1)
    return smoothed

smoothed是包含结果的数组;padded是包含窗口和足够多的零以达到长度的数组Nrolled是的副本,padded每次循环时都会向右移动一个元素。

smoothed is the array that will contain the results; padded is an array that contains the window and enough zeros to have length N; and rolled is a copy of padded that gets shifted to the right by one element each time through the loop.

在循环内部,我们ys乘以rolled来选择 11 个元素并将它们相加。

Inside the loop, we multiply ys by rolled to select 11 elements and add them up.

图 8-2显示了方波的处理结果。灰色线代表原始信号;深色线代表平滑后的信号。平滑后的信号在窗口前沿到达第一个过渡点时开始上升,并在窗口越过过渡点时趋于平缓。因此,过渡点不再那么突兀,拐点也更加柔和。聆听平滑后的信号,会感觉嗡嗡声减少,声音略微变得沉闷

Figure 8-2 shows the result for a square wave. The gray line is the original signal; the darker line is the smoothed signal. The smoothed signal starts to ramp up when the leading edge of the window reaches the first transition, and levels off when the window crosses the transition. As a result, the transitions are less abrupt, and the corners less sharp. If you listen to the smoothed signal, it sounds less buzzy and slightly muffled.

图 8-2. 400 Hz 的方波信号(灰色)和 11 元素移动平均值。

卷积

Convolution

我们刚才执行的操作——将窗口函数应用于波形的每个重叠部分——称为卷积

The operation we just performed—applying a window function to each overlapping segment of a wave—is called convolution.

卷积运算非常常见,以至于 NumPy 提供了一个比我的版本更简单、更快速的实现:

Convolution is such a common operation that NumPy provides an implementation that is simpler and faster than my version:

convolve = np.convolve(ys, window, mode='valid')
smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)
convolved = np.convolve(ys, window, mode='valid')
smooth2 = thinkdsp.Wave(convolved, framerate=wave.framerate)

np.convolve计算波阵列和窗口的卷积。模式标志指示它仅在窗口和波阵列完全重叠时才计算值,因此当窗口的右边缘到达波阵列的末端时,计算就会停止。除此之外,结果与图 8-2valid相同。

np.convolve computes the convolution of the wave array and the window. The mode flag valid indicates that it should only compute values when the window and the wave array overlap completely, so it stops when the right edge of the window reaches the end of the wave array. Other than that, the result is the same as in Figure 8-2.

实际上,还有另一点不同。上一节中的循环实际上计算的是互相关

Actually, there is one other difference. The loop in the previous section actually computes cross-correlation:

其中f是长度为N 的波阵列,g是窗口,✩ 是互相关的符号。为了计算结果的第 n个元素,我们将g向右移动,这就是索引为 的原因

where f is a wave array with length N, g is the window, and ✩ is the symbol for cross-correlation. To compute the nth element of the result, we shift g to the right, which is why the index is .

卷积的定义略有不同:

The definition of convolution is slightly different:

符号*表示卷积。区别在于g的索引:m被取反了,因此求和运算会反向遍历g的元素(假设负索引会循环到数组的末尾)。

The symbol * represents convolution. The difference is in the index of g: m has been negated, so the summation iterates the elements of g backward (assuming that negative indices wrap around to the end of the array).

由于本例中使用的窗口是对称的,因此互相关和卷积运算会得到相同的结果。使用其他窗口时,我们需要更加谨慎。

Because the window we used in this example is symmetric, cross-correlation and convolution yield the same result. When we use other windows, we will have to be more careful.

你可能会好奇为什么卷积的定义是这样,窗口的应用方式似乎与预期相反。原因有二:

You might wonder why convolution is defined like this, with the window applied in a way that seems backward. There are two reasons:

  • 这个定义自然而然地出现在几个应用中,特别是信号处理系统的分析,这是第 10 章的主题。

  • This definition comes up naturally for several applications, especially analysis of signal-processing systems, which is the topic of Chapter 10.

  • 此外,这个定义是卷积定理的基础,我们很快就会讲到它。

  • Also, this definition is the basis of the Convolution Theorem, coming up very soon.

最后,给那些了解太多的人提个醒:到目前为止,我还没有区分卷积和循环卷积。我们稍后会讲到。

Finally, a note for people who know too much: in the presentation so far I have not distinguished between convolution and circular convolution. We’ll get to it.

频域

The Frequency Domain

平滑处理使方波信号的过渡更加平缓,声音也略微变得沉闷。我们来看看这种操作对频谱的影响。首先,我将绘制原始波形的频谱:

Smoothing makes the transitions in a square signal less abrupt, and makes the sound slightly muffled. Let’s see what effect this operation has on the spectrum. First I’ll plot the spectrum of the original wave:

spectrum = wave.make_spectrum()
spectrum.plot(color=GRAY)
spectrum = wave.make_spectrum()
spectrum.plot(color=GRAY)

然后是平滑后的波形:

Then the smoothed wave:

convolve = np.convolve(wave.ys, window, mode='same')
smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)
spectrum2 = smooth.make_spectrum()
spectrum2.plot()
convolved = np.convolve(wave.ys, window, mode='same')
smooth = thinkdsp.Wave(convolved, framerate=wave.framerate)
spectrum2 = smooth.make_spectrum()
spectrum2.plot()

模式标志same表明结果的长度应与输入的长度相同。在本例中,结果会包含一些“溢出”的值,但目前这没关系。

The mode flag same indicates that the result should have the same length as the input. In this example, it will include a few values that “wrap around”, but that’s OK for now.

图 8-3显示了结果。基频几乎不变;前几个谐波被衰减,高次谐波几乎完全消失。因此,平滑处理的效果类似于低通滤波器,我们在“频谱”“粉红噪声”部分已经讨论过。

Figure 8-3 shows the result. The fundamental frequency is almost unchanged; the first few harmonics are attenuated, and the higher harmonics are almost eliminated. So smoothing has the effect of a low-pass filter, which we saw in “Spectrums” and “Pink Noise”.

图 8-3.平滑前后方波的频谱。

为了了解每个分量衰减了多少,我们可以计算两个光谱的比值:

To see how much each component has been attenuated, we can compute the ratio of the two spectrums:

安培数 = 频谱.安培数
amps2 = spectrum2.amps
比率 = 安培数² / 安培数    
ratio[amps<560] = 0
thinkplot.plot(ratio)
amps = spectrum.amps
amps2 = spectrum2.amps
ratio = amps2 / amps    
ratio[amps<560] = 0
thinkplot.plot(ratio)

ratio是平滑前后振幅的比值。当amps较小时,该比值可能很大且噪声较大,因此为了简化计算,除了谐波部分外,我将该比值设为 0。

ratio is the ratio of the amplitude before and after smoothing. When amps is small, this ratio can be big and noisy, so for simplicity I set the ratio to 0 except where the harmonics are.

图 8-4显示了结果。正如预期的那样,低频时比值很高,并在接近 4000 Hz 的截止频率处下降。但还有一个我们意想不到的特征:在截止频率以上,比值在 0 到 0.2 之间波动。这是怎么回事?

Figure 8-4 shows the result. As expected, the ratio is high for low frequencies and drops off at a cutoff frequency near 4000 Hz. But there is another feature we did not expect: above the cutoff, the ratio bounces around between 0 and 0.2. What’s up with that?

图 8-4.方波平滑前后的频谱比。

卷积定理

The Convolution Theorem

答案是卷积定理。用数学公式表述如下:

The answer is the Convolution Theorem. Stated mathematically:

其中f是波阵列,g是窗口。换句话说,卷积定理指出,如果我们对fg进行卷积,然后计算它们的离散傅里叶变换 (DFT),我们得到的结果与先分别计算fg的 DFT ,然后将结果逐元素相乘的结果相同。

where f is a wave array and g is a window. In words, the Convolution Theorem says that if we convolve f and g, and then compute the DFT, we get the same answer as when computing the DFT of f and g, and then multiplying the results elementwise.

当我们对波形进行卷积等运算时,我们称是在时域中进行操作,因为波形是时间的函数。当我们对离散傅里叶变换(DFT)进行乘法等运算时,我们称是在频域中进行操作,因为DFT是频率的函数。

When we apply an operation like convolution to a wave, we say we are working in the time domain, because the wave is a function of time. When we apply an operation like multiplication to the DFT, we are working in the frequency domain, because the DFT is a function of frequency.

利用这些术语,我们可以更简洁地表述卷积定理:

Using these terms, we can state the Convolution Theorem more concisely:

时域中的卷积对应于频域中的乘法。

Convolution in the time domain corresponds to multiplication in the frequency domain.

这就解释了图 8-4 的含义,因为当我们对一个波和一个窗函数进行卷积时,实际上是将波的频谱与窗函数的频谱相乘。为了理解其工作原理,我们可以计算窗函数的 DFT:

And that explains Figure 8-4, because when we convolve a wave and a window, we multiply the spectrum of the wave with the spectrum of the window. To see how that works, we can compute the DFT of the window:

padded = zero_pad(window, N)
dft_window = np.fft.rfft(padded)
thinkplot.plot(abs(dft_window))
padded = zero_pad(window, N)
dft_window = np.fft.rfft(padded)
thinkplot.plot(abs(dft_window))

padded包含平滑窗口,用零填充使其长度与相同wavedft_window包含的 DFT padded

padded contains the smoothing window, padded with zeros to be the same length as wave; dft_window contains the DFT of padded.

图 8-5显示了结果,以及我们在上一节中计算的比值。这些比值恰好就是振幅dft_window。数学表达式为:

Figure 8-5 shows the result, along with the ratios we computed in the previous section. The ratios are exactly the amplitudes in dft_window. Mathematically:

在此上下文中,窗口的离散傅里叶变换(DFT)被称为滤波器对于时域中的任何卷积窗口,在频域中都存在对应的滤波器。同样,对于频域中任何可以用元素级乘法表示的滤波器,也存在对应的窗口。

In this context, the DFT of a window is called a filter. For any convolution window in the time domain, there is a corresponding filter in the frequency domain. And for any filter that can be expressed by elementwise multiplication in the frequency domain, there is a corresponding window.

图 8-5.方波平滑前后的频谱比,以及平滑窗口的 DFT。

高斯滤波器

Gaussian Filter

我们在上一节中使用的移动平均窗口是一个低通滤波器,但它并不是一个很好的低通滤波器。离散傅里叶变换 (DFT) 的频谱起初下降很快,但随后会出现波动。这些波动被称为旁瓣,它们的出现是因为移动平均窗口类似于方波,因此其频谱包含高频谐波,而这些谐波的衰减速度相对较慢,与 成正比。

The moving average window we used in the previous section is a low-pass filter, but it is not a very good one. The DFT drops off steeply at first, but then it bounces around. Those bounces are called sidelobes, and they are there because the moving average window is like a square wave, so its spectrum contains high-frequency harmonics that drop off proportionally to , which is relatively slow.

使用高斯窗口可以做得更好。SciPy 提供了计算许多常用卷积窗口的函数,包括gaussian

We can do better with a Gaussian window. SciPy provides functions that compute many common convolution windows, including gaussian:

高斯分布 = scipy.signal.gaussian(M=11, std=2)
高斯分布 /= ∑(高斯分布)
gaussian = scipy.signal.gaussian(M=11, std=2)
gaussian /= sum(gaussian)

M是窗口中元素的数量;std是用于计算该窗口的高斯分布的标准差。图 8-6显示了窗口的形状。它是高斯“钟形曲线”的离散近似。图中还显示了前一个例子中的移动平均窗口,它有时被称为“车厢窗口”,因为它看起来像一个矩形的火车车厢。

M is the number of elements in the window; std is the standard deviation of the Gaussian distribution used to compute it. Figure 8-6 shows the shape of the window. It is a discrete approximation of the Gaussian “bell curve”. The figure also shows the moving average window from the previous example, which is sometimes called a boxcar window because it looks like a rectangular railway car.

图 8-6.箱型窗和高斯窗。

我再次使用这个窗口运行了前面几节中的计算,并生成了图 8-7,该图显示了平滑前后光谱的比率以及高斯窗口的 DFT。

I ran the computations from the previous sections again with this window and generated Figure 8-7, which shows the ratio of the spectrums before and after smoothing, along with the DFT of the Gaussian window.

作为低通滤波器,高斯平滑比简单的移动平均效果更好。在比率下降后,它仍然保持在较低水平,几乎没有我们之前使用箱形窗口时看到的旁瓣。因此,它在滤除高频信号方面表现更佳。

As a low-pass filter, Gaussian smoothing is better than a simple moving average. After the ratio drops off, it stays low, with almost none of the sidelobes we saw with the boxcar window. So it does a better job of cutting off the higher frequencies.

它之所以表现如此出色,是因为高斯曲线的 DFT 也是一条高斯曲线。因此,该比率随 成比例下降,这比 下降得快得多

The reason it does so well is that the DFT of a Gaussian curve is also a Gaussian curve. So the ratio drops off in proportion to , which is much faster than .

图 8-7.高斯平滑前后光谱的比值,以及窗口的 DFT。

高效卷积

Efficient Convolution

FFT 之所以如此重要,其中一个原因是它与卷积定理相结合,提供了一种计算卷积、互相关和自相关的有效方法。

One of the reasons the FFT is such an important algorithm is that, combined with the Convolution Theorem, it provides an efficient way to compute convolution, cross-correlation, and autocorrelation.

再次强调,卷积定理指出:

Again, the Convolution Theorem states:

计算卷积的一种方法是:

So one way to compute a convolution is:

其中 IDFT 是逆 DFT。卷积的简单实现所需时间与成正比;该算法使用 FFT,所需时间与 N² 成正比

where IDFT is the inverse DFT. A simple implementation of convolution takes time proportional to N 2; this algorithm, using the FFT, takes time proportional to .

我们可以通过双向计算相同的卷积来验证其有效性。例如,我将把它应用于图 8-1所示的 Facebook 股票数据:

We can confirm that it works by computing the same convolution both ways. As an example, I’ll apply it to the Facebook stock data shown in Figure 8-1:

导入 pandas as pd

names = ['日期', '开盘价', '最高价', '最低价', '收盘价', '成交量']
df = pd.read_csv('fb.csv', header=0, names=names)
ys = df.close.values[::-1]
import pandas as pd

names = ['date', 'open', 'high', 'low', 'close', 'volume']
df = pd.read_csv('fb.csv', header=0, names=names)
ys = df.close.values[::-1]

本示例使用 Pandas 从 CSV 文件(包含在本书的存储库中)读取数据。如果您不熟悉 Pandas,请不要担心:本书中我不会过多地使用它。但如果您感兴趣,可以在Think Stats (http://thinkstats2.com)上了解更多信息。

This example uses Pandas to read the data from the CSV file (included in the repository for this book). If you are not familiar with Pandas, don’t worry: I’m not going to do much with it in this book. But if you’re interested, you can learn more about it in Think Stats at http://thinkstats2.com.

结果df是一个 NumPy 数组,它是DataFramePandas 提供的数据结构之一,包含每日收盘价。close

The result, df, is a DataFrame, one of the data structures provided by Pandas. close is a NumPy array that contains daily closing prices.

接下来,我将创建一个高斯窗口,并将其与以下函数进行卷积close

Next I’ll create a Gaussian window and convolve it with close:

window = scipy.signal.gaussian(M=30, std=6)
window /= window.sum()
smoothed = np.convolve(ys, window, mode='valid')
window = scipy.signal.gaussian(M=30, std=6)
window /= window.sum()
smoothed = np.convolve(ys, window, mode='valid')

fft_convolve使用快速傅里叶变换 (FFT) 计算相同的结果:

fft_convolve computes the same thing using the FFT:

from np.fft import fft, ifft

def fft_convolve(signal, window):
    fft_signal = fft(signal)
    fft_window = fft(window)
    返回 ifft(fft_signal * fft_window)
from np.fft import fft, ifft

def fft_convolve(signal, window):
    fft_signal = fft(signal)
    fft_window = fft(window)
    return ifft(fft_signal * fft_window)

我们可以通过将窗口填充到与原来的长度相同的长度ys,然后计算卷积来测试它:

We can test it by padding the window to the same length as ys and then computing the convolution:

padded = zero_pad(window, N)
smoothed2 = fft_convolve(ys, padded)
padded = zero_pad(window, N)
smoothed2 = fft_convolve(ys, padded)

结果开头部分存在错误值,其中M为窗口长度。我们可以像这样去除这些错误值:

The result has bogus values at the beginning, where M is the length of the window. We can slice off the bogus values like this:

M = len(window)
smoothed2 = smoothed2[M-1:]
M = len(window)
smoothed2 = smoothed2[M-1:]

结果与预期相符,fft_convolve精度约为 12 位有效数字。

The result agrees with fft_convolve with about 12 digits of precision.

高效自相关

Efficient Autocorrelation

“卷积”一节中,我介绍了互相关和卷积的定义,我们看到它们几乎相同,只是卷积中的窗口是反转的。

In “Convolution” I presented definitions of cross-correlation and convolution, and we saw that they are almost the same, except that in convolution the window is reversed.

现在我们有了高效的卷积算法,也可以用它来计算互相关和自相关。利用上一节的数据,我们可以计算Facebook股票价格的自相关性:

Now that we have an efficient algorithm for convolution, we can also use it to compute cross-correlations and autocorrelations. Using the data from the previous section, we can compute the autocorrelation of Facebook stock prices:

corrs = np.correlate(close, close, mode='same')
corrs = np.correlate(close, close, mode='same')

当 时mode='same',结果与 的长度相同close,对应于从到 的滞后。图 8-8中的灰线显示了该结果。除了 处之外,没有峰值,因此该信号没有明显的周期性行为。然而,自相关函数缓慢下降,表明该信号类似于粉红噪声,正如我们在“自相关”部分所看到的。lag=0

With mode='same', the result has the same length as close, corresponding to lags from to . The gray line in Figure 8-8 shows the result. Except at lag=0, there are no peaks, so there is no apparent periodic behavior in this signal. However, the autocorrelation function drops off slowly, suggesting that this signal resembles pink noise, as we saw in “Autocorrelation”.

图 8-8.由 NumPy 和 fft_autocorr 计算的自相关函数。

要使用卷积计算自相关,我们需要对信号进行零填充,使其长度翻倍。这是必要的,因为快速傅里叶变换 (FFT) 基于信号周期性的假设,即信号会从末尾循环到开头。对于像这样的时间序列数据,这个假设并不成立。通过添加零并截断结果,可以去除这些无效值。

To compute autocorrelation using convolution, we have to zero-pad the signal to double the length. This trick is necessary because the FFT is based on the assumption that the signal is periodic; that is, that it wraps around from the end to the beginning. With time-series data like this, that assumption is invalid. Adding zeros, and then trimming the results, removes the bogus values.

另外,请记住卷积会反转窗口方向。为了消除这种影响,我们在调用 `flash()` 之前先反转窗口方向fft_convolve,使用np.flipud`flash()` 函数翻转 NumPy 数组。结果是数组的视图,而不是副本,因此此操作速度很快:

Also, remember that convolution reverses the direction of the window. In order to cancel that effect, we reverse the direction of the window before calling fft_convolve, using np.flipud, which flips a NumPy array. The result is a view of the array, not a copy, so this operation is fast:

def fft_autocorr(signal):
    N = len(信号)
    signal = thinkdsp.zero_pad(signal, 2*N)
    window = np.flipud(signal)

    corrs = fft_convolve(signal, window)
    corrs = np.roll(corrs, N//2+1)[:N]
    返回值
def fft_autocorr(signal):
    N = len(signal)
    signal = thinkdsp.zero_pad(signal, 2*N)
    window = np.flipud(signal)

    corrs = fft_convolve(signal, window)
    corrs = np.roll(corrs, N//2+1)[:N]
    return corrs

结果fft_convolve长度为2N。其中,第一个和最后一个元素是有效的;其余元素是零填充的结果。为了选择有效元素,我们对结果进行滚动,并选择前N 个元素,对应于从到的滞后值

The result from fft_convolve has length 2N. Of those, the first and last are valid; the rest are the result of zero-padding. To select the valid element, we roll the results and select the first N, corresponding to lags from to .

如图 8-8所示,fft_autocorr和 的结果np.correlate相同(精度约为 9 位有效数字)。

As shown in Figure 8-8, the results from fft_autocorr and np.correlate are identical (with about 9 digits of precision).

请注意,图 8-8中的相关性数值很大;我们可以按照“使用 NumPy”中所示,将它们归一化(介于 -1 和 1 之间) 。

Notice that the correlations in Figure 8-8 are large numbers; we could normalize them (between –1 and 1) as shown in “Using NumPy”.

我们这里用于自相关的策略也适用于互相关。同样,你需要先对信号进行预处理,方法是翻转其中一个信号并对两个信号都进行填充,然后再去除结果中的无效部分。这种填充和去除操作比较繁琐,但正因如此,像 NumPy 这样的库才提供了相应的函数来帮你完成这些工作。

The strategy we used here for autocorrelation also works for cross-correlation. Again, you have to prepare the signals by flipping one and padding both, and then you have to trim the invalid parts of the result. This padding and trimming is a nuisance, but that’s why libraries like NumPy provide functions to do it for you.

练习

Exercises

这些练习的答案在chap08soln.ipynb……

Solutions to these exercises are in chap08soln.ipynb.

练习 8-1。

本章的笔记本是[此处应填写笔记本chap08.ipynb内容]。请阅读并运行代码。

The notebook for this chapter is chap08.ipynb. Read through it and run the code.

它包含一个交互式小部件,可让您试验高斯窗口的参数,以查看它们对截止频率的影响。

It contains an interactive widget that lets you experiment with the parameters of the Gaussian window to see what effect they have on the cutoff frequency.

当您增加高斯窗口的宽度,std而不增加窗口中的元素数量时,会发生什么问题M

What goes wrong when you increase the width of the Gaussian window, std, without increasing the number of elements in the window, M?

练习 8-2。

本章中我论证了高斯曲线的傅里叶变换仍然是高斯曲线。对于离散傅里叶变换,这种关系近似成立。

In this chapter I claimed that the Fourier Transform of a Gaussian curve is also a Gaussian curve. For Discrete Fourier Transforms, this relationship is approximately true.

试着举几个例子。当改变参数时,傅里叶变换会发生什么变化std

Try it out for a few examples. What happens to the Fourier Transform as you vary std?

练习 8-3。

如果你完成了第三章的练习,就会看到汉明窗以及NumPy提供的其他一些窗函数对谱泄漏的影响。我们可以通过观察它们的密度泛函理论(DFT)来深入了解这些窗函数的影响。

If you did the exercises in Chapter 3, you saw the effect of the Hamming window, and some of the other windows provided by NumPy, on spectral leakage. We can get some insight into the effects of these windows by looking at their DFTs.

除了本章中使用的高斯窗之外,再创建一个相同大小的汉明窗。对窗口进行零填充,并绘制它们的密度泛函理论 (DFT) 曲线。哪个窗口的低通滤波效果更好?使用对数纵坐标绘制 DFT 曲线可能更有帮助。

In addition to the Gaussian window we used in this chapter, create a Hamming window with the same size. Zero-pad the windows and plot their DFTs. Which window acts as a better low-pass filter? You might find it useful to plot the DFTs on a log-y scale.

尝试使用几种不同的窗口和几种不同的尺寸。

Experiment with a few different windows and a few different sizes.

第九章微分与积分

Chapter 9. Differentiation and Integration

本章承接上一章的内容,探讨时域中的窗口与频域中的滤波器之间的关系。

This chapter picks up where the previous chapter left off, looking at the relationship between windows in the time domain and filters in the frequency domain.

我们将重点研究有限差分窗口(近似于微分)和累积和运算(近似于积分)的效果。

In particular, we’ll look at the effects of a finite difference window, which approximates differentiation, and the cumulative sum operation, which approximates integration.

本章的代码位于chap09.ipynb本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp09查看它。

The code for this chapter is in chap09.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp09.

有限差分法

Finite Differences

“平滑”部分,我们对 Facebook 的每日股价应用了平滑窗口,发现时域中的平滑窗口对应于频域中的低通滤波器。

In “Smoothing”, we applied a smoothing window to the daily stock price of Facebook and found that a smoothing window in the time domain corresponds to a low-pass filter in the frequency domain.

在本节中,我们将观察每日价格变化,并发现计算时域中连续元素之间的差异相当于高通滤波器。

In this section, we’ll look at daily price changes and see that computing the difference between successive elements, in the time domain, corresponds to a high-pass filter.

以下是读取数据、将其存储为波形并计算其频谱的代码:

Here’s the code to read the data, store it as a wave, and compute its spectrum:

导入 pandas as pd

names = ['日期', '开盘价', '最高价', '最低价', '收盘价', '成交量']
df = pd.read_csv('fb.csv', header=0, names=names)
ys = df.close.values[::-1]
close = thinkdsp.Wave(ys, framerate=1)
spectrum = wave.make_spectrum()
import pandas as pd

names = ['date', 'open', 'high', 'low', 'close', 'volume']
df = pd.read_csv('fb.csv', header=0, names=names)
ys = df.close.values[::-1]
close = thinkdsp.Wave(ys, framerate=1)
spectrum = wave.make_spectrum()

本示例使用 Pandas 读取 CSV 文件;结果是一个DataFrame包含df开盘价、收盘价、最高价和最低价列的表格。我选择收盘价并将其保存到一个Wave对象中。帧速率为每天 1 个样本。

This example uses Pandas to read the CSV file; the result is a DataFrame, df, with columns for the opening price, closing price, and high and low prices. I select the closing prices and save them in a Wave object. The frame rate is 1 sample per day.

图 9-1显示了该时间序列及其频谱。从视觉上看,该时间序列类似于布朗噪声(参见“布朗噪声”)。频谱看起来像一条直线,尽管存在噪声。估计的斜率为 -1.9,这与布朗噪声的特征相符。

Figure 9-1 shows this time series and its spectrum. Visually, the time series resembles Brownian noise (see “Brownian Noise”). And the spectrum looks like a straight line, albeit a noisy one. The estimated slope is –1.9, which is consistent with Brownian noise.

图 9-1. Facebook 股票的每日收盘价及该时间序列的频谱。

现在让我们用以下公式计算每日价格变化np.diff

Now let’s compute the daily price change using np.diff:

diff = np.diff(ys)
change = thinkdsp.Wave(diff, framerate=1)
change_spectrum = change.make_spectrum()
diff = np.diff(ys)
change = thinkdsp.Wave(diff, framerate=1)
change_spectrum = change.make_spectrum()

图 9-2显示了由此产生的波形及其频谱。每日变化类似于白噪声,频谱的估计斜率 -0.06 接近于零,这与我们对白噪声的预期相符。

Figure 9-2 shows the resulting wave and its spectrum. The daily changes resemble white noise, and the estimated slope of the spectrum, –0.06, is near zero, which is what we expect for white noise.

图 9-2. Facebook 股票每日价格变化及该时间序列的频谱。

频域

The Frequency Domain

计算相邻元素之间的差值与使用窗口进行卷积相同[1, -1]。如果这些元素的顺序看起来是反的,请记住,卷积运算在应用于信号之前会反转窗口。

Computing the difference between successive elements is the same as convolution with the window [1, -1]. If the order of those elements seems backward, remember that convolution reverses the window before applying it to the signal.

我们可以通过计算窗口的离散傅里叶变换(DFT)来观察该操作在频域中的效果:

We can see the effect of this operation in the frequency domain by computing the DFT of the window:

diff_window = np.array([1.0, -1.0])
padded = thinkdsp.zero_pad(diff_window, len(close))
diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)
diff_filter = diff_wave.make_spectrum()
diff_window = np.array([1.0, -1.0])
padded = thinkdsp.zero_pad(diff_window, len(close))
diff_wave = thinkdsp.Wave(padded, framerate=close.framerate)
diff_filter = diff_wave.make_spectrum()

图 9-3显示了结果。有限差分窗口对应于一个高通滤波器:其幅度随频率增加而增加,低频时呈线性增加,高频后呈亚线性增加。下一节我们将解释原因。

Figure 9-3 shows the result. The finite difference window corresponds to a high-pass filter: its amplitude increases with frequency, linearly for low frequencies and then sublinearly after that. In the next section, we’ll see why.

图 9-3.对应于 diff 和 differential 算子(左)和积分算子(右,log-y 刻度)的滤波器。

分化

Differentiation

我们在上一节中使用的窗口是一阶导数的数值近似值,因此该滤波器近似于微分的效果。

The window we used in the previous section is a numerical approximation of the first derivative, so the filter approximates the effect of differentiation.

时域中的微分对应于频域中的简单滤波器;我们可以通过一些数学运算来弄清楚它是什么。

Differentiation in the time domain corresponds to a simple filter in the frequency domain; we can figure out what it is with a little math.

假设我们有一个频率为f 的复正弦波:

Suppose we have a complex sinusoid with frequency f:

E f的一阶导数为:

The first derivative of Ef is:

我们可以将其改写为:

which we can rewrite as:

换句话说,对E f求导就等于乘以,这是一个具有模和角度的复数

In other words, taking the derivative of Ef is the same as multiplying by , which is a complex number with magnitude and angle .

我们可以这样计算与微分对应的滤波器:

We can compute the filter that corresponds to differentiation like this:

deriv_filter = close.make_spectrum()
deriv_filter.hs = PI2 * 1j * deriv_filter.fs
deriv_filter = close.make_spectrum()
deriv_filter.hs = PI2 * 1j * deriv_filter.fs

close我首先得到了具有合适尺寸和帧速率的频谱,然后hs用替换了图 9-3(左)显示了该滤波器;它是一条直线。

I started with the spectrum of close, which has the right size and frame rate, then replaced the hs with . Figure 9-3 (left) shows this filter; it is a straight line.

正如我们在“矩阵合成”中看到的那样,将复正弦波乘以复数会产生两个效果:它会将振幅乘以 ,在本例中乘以,并将相位偏移移动,在本例中移动

As we saw in “Synthesis with Matrices”, multiplying a complex sinusoid by a complex number has two effects: it multiplies the amplitude, in this case by , and shifts the phase offset, in this case by .

如果您熟悉算子和特征函数的语言,那么每个E f都是微分算子的特征函数,其对应的特征值为。参见http://en.wikipedia.org/wiki/Eigenfunction

If you are familiar with the language of operators and eigenfunctions, each Ef is an eigenfunction of the differentiation operator, with the corresponding eigenvalue . See http://en.wikipedia.org/wiki/Eigenfunction.

如果您不熟悉这种语言,它的意思如下:

If you are not familiar with that language, here’s what it means:

  • 运算符是一种接受一个函数并返回另一个函数的函数。例如,微分就是一个运算符。

  • An operator is a function that takes a function and returns another function. For example, differentiation is an operator.

  • 如果对函数g应用算子 会产生函数乘以一个标量的效果,则称函数g是算子 的本征函数。也就是说,。

  • A function, g, is an eigenfunction of an operator, , if applying to g has the effect of multiplying the function by a scalar. That is, .

  • 在这种情况下,标量λ是对应于特征函数g 的特征值。

  • In that case, the scalar λ is the eigenvalue that corresponds to the eigenfunction g.

  • 一个给定的算符可能有很多本征函数,每个本征函数都有一个对应的本征值。

  • A given operator might have many eigenfunctions, each with a corresponding eigenvalue.

因为复正弦函数是微分算子的特征函数,所以很容易求导。我们只需要乘以一个复标量即可。

Because complex sinusoids are eigenfunctions of the differentiation operator, they are easy to differentiate. All we have to do is multiply by a complex scalar.

对于包含多个分量的信号,处理过程只是稍微复杂一些:

For signals with more than one component, the process is only slightly harder:

  1. 将信号表示为复正弦波之和。

  2. Express the signal as the sum of complex sinusoids.

  3. 通过乘法计算每个分量的导数。

  4. Compute the derivative of each component by multiplication.

  5. 将各差异化组成部分相加。

  6. Add up the differentiated components.

如果这个过程听起来很熟悉,那是因为它与“高效卷积”中的卷积算法完全相同:计算 DFT,乘以滤波器,然后计算逆 DFT。

If that process sounds familiar, that’s because it is identical to the algorithm for convolution in “Efficient Convolution”: compute the DFT, multiply by a filter, and compute the inverse DFT.

Spectrum提供了一种应用微分滤波器的方法:

Spectrum provides a method that applies the differentiation filter:

# 光谱类:

    def different(self):
        self.hs *= PI2 * 1j * self.fs
# class Spectrum:

    def differentiate(self):
        self.hs *= PI2 * 1j * self.fs

我们可以用它来计算Facebook时间序列的导数:

We can use it to compute the derivative of the Facebook time series:

deriv_spectrum = close.make_spectrum()
deriv_spectrum.differentiate()
deriv = deriv_spectrum.make_wave()
deriv_spectrum = close.make_spectrum()
deriv_spectrum.differentiate()
deriv = deriv_spectrum.make_wave()

图 9-4将通过计算得到的每日价格变化np.diff与我们刚刚计算的导数进行了比较。我选择了时间序列中的前 50 个值,以便更清楚地看到差异。

Figure 9-4 compares the daily price changes computed by np.diff with the derivative we just computed. I selected the first 50 values in the time series so we can see the differences more clearly.

图 9-4. np.diff 计算的每日价格变化与应用微分滤波器计算的每日价格变化的比较。

如图 9-3 (左)所示,导数噪声较大,因为它放大了高频分量。此外,导数的前几个元素噪声也很大。问题在于,基于 DFT 的导数是基于信号周期性的假设。实际上,它将时间序列中的最后一个元素与第一个元素连接起来,从而在边界处产生伪影。

The derivative is noisier, because it amplifies the high-frequency components more, as shown in Figure 9-3 (left). Also, the first few elements of the derivative are very noisy. The problem there is that the DFT-based derivative is based on the assumption that the signal is periodic. In effect, it connects the last element in the time series back to the first element, which creates artifacts at the boundaries.

综上所述,我们已经证明:

To summarize, we have shown:

  • 计算信号中连续值之间的差值可以表示为与简单窗口进行卷积。结果是对一阶导数的近似值。

  • Computing the difference between successive values in a signal can be expressed as convolution with a simple window. The result is an approximation of the first derivative.

  • 时域微分对应于频域中的简单滤波器。对于周期信号,其结果恰好是其一阶导数。对于某些非周期信号,它可以近似地表示导数。

  • Differentiation in the time domain corresponds to a simple filter in the frequency domain. For periodic signals, the result is the first derivative, exactly. For some non-periodic signals, it can approximate the derivative.

利用 DFT 计算导数是求解微分方程的谱方法的基础(参见http://en.wikipedia.org/wiki/Spectral_method)。

Using the DFT to compute derivatives is the basis of spectral methods for solving differential equations (see http://en.wikipedia.org/wiki/Spectral_method).

特别是,它对分析线性时不变系统很有用,这将在第 10 章中讨论。

In particular, it is useful for the analysis of linear, time-invariant systems, which is coming up in Chapter 10.

一体化

Integration

在前一节中,我们证明了时域中的微分对应于频域中的一个简单的滤波器:它将每个分量乘以。由于积分是微分的逆运算,它也对应于一个简单的滤波器:它将每个分量除以

In the previous section, we showed that differentiation in the time domain corresponds to a simple filter in the frequency domain: it multiplies each component by . Since integration is the inverse of differentiation, it also corresponds to a simple filter: it divides each component by .

我们可以这样计算这个滤波器:

We can compute this filter like this:

integ_filter = close.make_spectrum()
integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)
integ_filter = close.make_spectrum()
integ_filter.hs = 1 / (PI2 * 1j * integ_filter.fs)

图 9-3(右)以对数y刻度显示了该滤波器,这样更容易看清。

Figure 9-3 (right) shows this filter on a log-y scale, which makes it easier to see.

Spectrum提供了一种应用积分过滤器的方法:

Spectrum provides a method that applies the integration filter:

# 光谱类:

    def integrate(self):
        self.hs /= PI2 * 1j * self.fs
# class Spectrum:

    def integrate(self):
        self.hs /= PI2 * 1j * self.fs

我们可以通过将积分滤波器应用于我们刚刚计算出的导数的频谱来验证其正确性:

We can confirm that the integration filter is correct by applying it to the spectrum of the derivative we just computed:

integ_spectrum = deriv_spectrum.copy()
integ_spectrum.integrate()
integ_spectrum = deriv_spectrum.copy()
integ_spectrum.integrate()

但请注意,在除法运算中,我们除以了 0。在 NumPy 中,结果是 NaN,这是一个特殊的浮点值,表示“非数字”。我们可以通过在将频谱转换回波形之前将此值设置为 0 来部分解决这个问题:

But notice that at , we are dividing by 0. The result in NumPy is NaN, which is a special floating-point value that represents “not a number”. We can partially deal with this problem by setting this value to 0 before converting the spectrum back to a wave:

integ_spectrum.hs[0] = 0
integ_wave = integ_spectrum.make_wave()
integ_spectrum.hs[0] = 0
integ_wave = integ_spectrum.make_wave()

图 9-5显示了积分导数及其原始时间序列。它们几乎完全相同,但积分导数向下平移了。问题在于,当我们处理该分量时,我们将信号的偏置设置为 0。但这并不奇怪;通常,微分会丢失偏置信息,而积分无法恢复它。从某种意义上说,此处的 NaN 值表明该元素未知。

Figure 9-5 shows this integrated derivative along with the original time series. They are almost identical, but the integrated derivative has been shifted down. The problem is that when we clobbered the component, we set the bias of the signal to 0. But that should not be surprising; in general, differentiation loses information about the bias, and integration can’t recover it. In some sense, the NaN at is telling us that this element is unknown.

图 9-5.原始时间序列与积分导数的比较。

如果我们提供这个“积分常数”,结果就相同了,这证实了这个积分滤波器是微分滤波器的正确逆滤波器。

If we provide this “constant of integration”, the results are identical, which confirms that this integration filter is the correct inverse of the differentiation filter.

累计总和

Cumulative Sum

就像微分算子近似于积分一样diff,累积和近似于积分。我将用锯齿波信号来演示:

In the same way that the diff operator approximates differentiation, the cumulative sum approximates integration. I’ll demonstrate with a sawtooth signal:

signal = thinkdsp.SawtoothSignal(freq=50)
in_wave = signal.make_wave(duration=0.1, framerate=44100)
signal = thinkdsp.SawtoothSignal(freq=50)
in_wave = signal.make_wave(duration=0.1, framerate=44100)

图 9-6显示了该波及其频谱。

Figure 9-6 shows this wave and its spectrum.

Wave提供一个计算波形数组累积和并返回新Wave对象的方法:

Wave provides a method that computes the cumulative sum of a wave array and returns a new Wave object:

# Wave 类:

    def cumsum(self):
        ys = np.cumsum(self.ys)
        ts = self.ts.copy()
        返回 Wave(ys, ts, self.framerate)
# class Wave:

    def cumsum(self):
        ys = np.cumsum(self.ys)
        ts = self.ts.copy()
        return Wave(ys, ts, self.framerate)

我们可以用它来计算以下各项的累积和in_wave

We can use it to compute the cumulative sum of in_wave:

输出波 = 输入波.累积值()
out_wave.unbias()
out_wave = in_wave.cumsum()
out_wave.unbias()

图 9-7显示了所得波形及其频谱。如果您完成了第二章的练习,那么这个波形应该看起来很眼熟:它是一个抛物线信号。

Figure 9-7 shows the resulting wave and its spectrum. If you did the exercises in Chapter 2, this waveform should look familiar: it’s a parabolic signal.

图 9-6.锯齿波及其频谱。
图 9-7.抛物线波及其频谱。

比较两者,我们发现抛物线信号频谱中各分量的振幅衰减速度比锯齿波信号频谱中更快。在第二章中,我们看到锯齿波分量的衰减与 成正比。由于累积和近似于积分,而积分会按 成比例地滤波分量,因此抛物线波分量的衰减与 成正比

Comparing the two, we see that the amplitudes of the components drop off more quickly in the spectrum of the parabolic signal than in the spectrum of the sawtooth signal. In Chapter 2, we saw that the components of the sawtooth drop off in proportion to . Since the cumulative sum approximates integration, and integration filters components in proportion to , the components of the parabolic wave drop off in proportion to .

我们可以通过计算与累积和对应的滤波器,以图形方式看到这一点:

We can see that graphically by computing the filter that corresponds to the cumulative sum:

cumsum_filter = diff_filter.copy()
cumsum_filter.hs = 1 / cumsum_filter.hs
cumsum_filter = diff_filter.copy()
cumsum_filter.hs = 1 / cumsum_filter.hs

因为cumsum是 的逆运算diff,所以我们从 的副本开始diff_filter,它是与运算 对应的滤波器diff,然后对 进行逆运算hs

Because cumsum is the inverse operation of diff, we start with a copy of diff_filter, which is the filter that corresponds to the diff operation, and then invert the hs.

图 9-8显示了对应于累积和滤波器和积分滤波器的滤波器。累积和滤波器是对积分滤波器的良好近似,但在最高频率处,其衰减速度略快。

Figure 9-8 shows the filters corresponding to cumulative sum and integration. The cumulative sum is a good approximation of integration except at the highest frequencies, where it drops off a little faster.

图 9-8.与累积和及积分对应的滤波器。

为了确认这是累积和的正确滤波器,我们可以将其out_wave与以下频谱的比值进行比较in_wave

To confirm that this is the correct filter for the cumulative sum, we can compare it to the ratio of the spectrum out_wave to the spectrum of in_wave:

in_spectrum = in_wave.make_spectrum()
out_spectrum = out_wave.make_spectrum()
ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)
in_spectrum = in_wave.make_spectrum()
out_spectrum = out_wave.make_spectrum()
ratio_spectrum = out_spectrum.ratio(in_spectrum, thresh=1)

以下是计算这些比率的方法:

And here’s the method that computes the ratios:

def ratio(self, denom, thresh=1):
    ratio_spectrum = self.copy()
    ratio_spectrum.hs /= denom.hs
    ratio_spectrum.hs[denom.amps < thresh] = np.nan
    返回比率光谱
def ratio(self, denom, thresh=1):
    ratio_spectrum = self.copy()
    ratio_spectrum.hs /= denom.hs
    ratio_spectrum.hs[denom.amps < thresh] = np.nan
    return ratio_spectrum

denom.amps较小时,所得比率会有噪声,所以我将这些值设置为 NaN。

When denom.amps is small, the resulting ratio is noisy, so I set those values to NaN.

图 9-9显示了累积和对应的比率和滤波器。它们吻合,这证实了反转滤波器即可diff得到滤波器cumsum

Figure 9-9 shows the ratios and the filter corresponding to the cumulative sum. They agree, which confirms that inverting the filter for diff yields the filter for cumsum.

图 9-9.与前后光谱的累积和及实际比率相对应的滤波器。

cumsum最后,我们可以通过在频域中应用滤波器来验证卷积定理的适用性:

Finally, we can confirm that the Convolution Theorem applies by applying the cumsum filter in the frequency domain:

out_wave2 = (in_spectrum * cumsum_filter).make_wave()
out_wave2 = (in_spectrum * cumsum_filter).make_wave()

在浮点误差范围内,它与我们用计算得到的out_wave2完全相同,因此卷积定理成立!但请注意,此演示仅适用于周期信号。out_wavecumsum

Within the limits of floating-point error, out_wave2 is identical to out_wave, which we computed using cumsum, so the Convolution Theorem works! But note that this demonstration only works with periodic signals.

噪声积分

Integrating Noise

“布朗噪声”一节中,我们通过计算白噪声的累积和生成了布朗噪声。现在我们已经了解了cumsum频域中的影响,对布朗噪声的频谱也有了一定的认识。

In “Brownian Noise”, we generated Brownian noise by computing the cumulative sum of white noise. Now that we understand the effect of cumsum in the frequency domain, we have some insight into the spectrum of Brownian noise.

白噪声在所有频率上的平均功率相等。计算累积和时,每个分量的幅度除以频率f。由于功率是幅度的平方,因此每个分量的功率除以f² 。所以,平均而言,频率f处的功率与 以下公式成正比

White noise has equal power at all frequencies, on average. When we compute the cumulative sum, the amplitude of each component is divided by f. Since power is the square of magnitude, the power of each component is divided by f 2. So on average, the power at frequency f is proportional to :

其中K是一个无关紧要的常数。两边取对数得到:

where K is a constant that’s not important. Taking the log of both sides yields:

因此,当我们在双对数坐标系中绘制布朗噪声的频谱时,我们期望看到一条斜率为 -2 的直线,至少近似如此。

And that’s why, when we plot the spectrum of Brownian noise on a log-log scale, we expect to see a straight line with slope –2, at least approximately.

在“有限差分法”中,我们绘制了Facebook股票收盘价的频谱图,并估计其斜率为-1.9,这与布朗噪声一致。许多股票价格都具有类似的频谱。

In “Finite Differences” we plotted the spectrum of closing prices for Facebook stock, and estimated that the slope is –1.9, which is consistent with Brownian noise. Many stock prices have similar spectrums.

当我们使用该diff算子计算每日变化时,我们将每个分量的振幅乘以一个与f成正比的滤波器,这意味着我们将每个分量的功率乘以。在双对数坐标系中,此操作将功率谱的斜率加 2,这就是为什么结果的估计斜率接近 0.1(但略低一些,因为它 只是diff近似微分)。

When we used the diff operator to compute daily changes, we multiplied the amplitude of each component by a filter proportional to f, which means we multiplied the power of each component by f 2. On a log-log scale, this operation adds 2 to the slope of the power spectrum, which is why the estimated slope of the result is near 0.1 (but a little lower, because diff only approximates differentiation).

练习

Exercises

这些练习的答案在chap09soln.ipynb……

Solutions to these exercises are in chap09soln.ipynb.

练习 9-1。

本章的笔记本是[此处应填写笔记本chap09.ipynb内容]。请阅读并运行代码。

The notebook for this chapter is chap09.ipynb. Read through it and run the code.

“累积和”一节中,我提到过有些例子不适用于非周期信号。尝试将周期性的锯齿波替换为非周期性的 Facebook 数据,看看会发生什么。

In “Cumulative Sum”, I mentioned that some of the examples don’t work with non-periodic signals. Try replacing the sawtooth wave, which is periodic, with the Facebook data, which is not, and see what goes wrong.

练习 9-2。

diff本练习的目标是探究和differentiate对信号的影响。创建一个三角波并绘制其波形。应用diff并绘制结果。计算三角波的频谱,应用并绘制结果。将频谱转换回波形并绘制。和对该波形differentiate的影响是否存在差异?diffdifferentiate

The goal of this exercise is to explore the effects of diff and differentiate on a signal. Create a triangle wave and plot it. Apply diff and plot the result. Compute the spectrum of the triangle wave, apply differentiate, and plot the result. Convert the spectrum back to a wave and plot it. Are there differences between the effects of diff and differentiate for this wave?

练习 9-3。

cumsum本练习的目标是探究和integrate对信号的影响。创建一个方波并绘制其图像。应用cumsum并绘制结果。计算方波的频谱,应用并绘制结果。将频谱转换回波形并绘制其图像。和对该波形integrate的影响是否存在差异?cumsumintegrate

The goal of this exercise is to explore the effects of cumsum and integrate on a signal. Create a square wave and plot it. Apply cumsum and plot the result. Compute the spectrum of the square wave, apply integrate, and plot the result. Convert the spectrum back to a wave and plot it. Are there differences between the effects of cumsum and integrate for this wave?

练习 9-4。

本练习的目标是探究两次积分的影响。首先创建一个锯齿波,计算其频谱,然后进行integrate两次积分。绘制得到的波形及其频谱图。该波形的数学表达式是什么?为什么它与正弦波相似?

The goal of this exercise is to explore the effect of integrating twice. Create a sawtooth wave, compute its spectrum, then apply integrate twice. Plot the resulting wave and its spectrum. What is the mathematical form of the wave? Why does it resemble a sinusoid?

练习 9-5。

本练习的目标是探究二阶差分和二阶导数的影响。创建一个CubicSignal定义在 中的thinkdsp。通过两次应用 来计算二阶差分diff。结果如何?通过differentiate两次对谱应用 来计算二阶导数。结果是否相同?

The goal of this exercise is to explore the effects of the second difference and second derivative. Create a CubicSignal, which is defined in thinkdsp. Compute the second difference by applying diff twice. What does the result look like? Compute the second derivative by applying differentiate to the spectrum twice. Does the result look the same?

绘制对应于二阶差分和二阶导数的滤波器图像并进行比较。提示:为了使滤波器图像处于同一尺度,请使用帧率为 1 的波形。

Plot the filters that correspond to the second difference and the second derivative and compare them. Hint: in order to get the filters on the same scale, use a wave with frame rate 1.

第十章LTI系统

Chapter 10. LTI Systems

本章以音乐声学为例,介绍信号与系统理论。它解释了卷积定理的一个重要应用,即线性时不变系统的特征描述(我稍后会给出定义)。

This chapter presents the theory of signals and systems, using musical acoustics as an example. It explains an important application of the Convolution Theorem, characterization of linear, time-invariant systems (which I’ll define soon).

本章的代码chap10.ipynb位于本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp10查看它。

The code for this chapter is in chap10.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp10.

信号与系统

Signals and Systems

在信号处理的语境中,系统是对任何以信号作为输入并产生信号作为输出的事物的抽象表示。

In the context of signal processing, a system is an abstract representation of anything that takes a signal as input and produces a signal as output.

例如,电子放大器是一种以电信号作为输入并产生(更大的)信号作为输出的电路。

For example, an electronic amplifier is a circuit that takes an electrical signal as input and produces a (louder) signal as output.

再举一个例子,当你听音乐表演时,你可以把房间想象成一个系统,它会接收表演产生位置的声音,并在你听到声音的位置产生略有不同的声音。

As another example, when you listen to a musical performance, you can think of the room as a system that takes the sound of the performance at the location where it is generated and produces a somewhat different sound at the location where you hear it.

线性时不变系统1是具有以下两个性质的系统:

A linear, time-invariant system1 is a system with these two properties:

线性
Linearity

如果同时向系统中输入两个信号,则结果等于它们的输出之和。数学上,如果输入x₁产生输出y₁ 另一个输入x₂产生输出y₂,则x₁ + y₂ = y₁ + y₂,其中ab是标量。

If you put two inputs into the system at the same time, the result is the sum of their outputs. Mathematically, if an input x1 produces output y1 and another input x2 produces y2, then produces , where a and b are scalars.

时间不变性
Time invariance

系统的效果不会随时间变化,也不取决于系统的状态。因此,如果输入x1和 x2 仅在时间上有所不同,则它们的输出 y1 和 y2相差相同时间除此之外完全相同。

The effect of the system doesn’t vary over time, or depend on the state of the system. So if inputs x1 and x2 differ only by a shift in time, their outputs, y1 and y2, differ by the same shift but are otherwise identical.

许多物理系统都具有这些特性,至少近似如此:

Many physical systems have these properties, at least approximately:

  • 仅包含电阻器、电容器和电感器的电路是线性时不变 (LTI) 电路,因为这些元件的行为与其理想模型一致。

  • Circuits that contain only resistors, capacitors, and inductors are LTI, to the degree that the components behave like their idealized models.

  • 包含弹簧、质量块和阻尼器的机械系统也是线性时不变系统,假设弹簧是线性的(力与位移成正比),阻尼器是线性的(力与速度成正比)。

  • Mechanical systems that contain springs, masses, and dashpots are also LTI, assuming linear springs (force proportional to displacement) and dashpots (force proportional to velocity).

  • 此外,与本书中的应用最为相关的是,传播声音的介质(包括空气、水和固体)可以用 LTI 系统很好地建模。

  • Also, and most relevant to applications in this book, the media that transmit sound (including air, water, and solids) are well modeled by LTI systems.

LTI 系统由线性微分方程描述,这些方程的解是复正弦波(参见http://en.wikipedia.org/wiki/Linear_differential_equation)。

LTI systems are described by linear differential equations, and the solutions of those equations are complex sinusoids (see http://en.wikipedia.org/wiki/Linear_differential_equation).

该结果提供了一种计算线性时不变系统对输入信号影响的算法:

This result provides an algorithm for computing the effect of an LTI system on an input signal:

  1. 将信号表示为复正弦分量之和。

  2. Express the signal as the sum of complex sinusoid components.

  3. 对于每个输入分量,计算相应的输出分量。

  4. For each input component, compute the corresponding output component.

  5. 将输出分量相加。

  6. Add up the output components.

到目前为止,我希望这个算法听起来很熟悉。它与我们在“高效卷积”一节中用于卷积的算法以及在“微分”一节中用于微分的算法相同。这个过程被称为谱分解,因为我们将输入信号“分解”成它的频谱分量。

At this point, I hope this algorithm sounds familiar. It’s the same algorithm we used for convolution in “Efficient Convolution”, and for differentiation in “Differentiation”. This process is called spectral decomposition because we “decompose” the input signal into its spectral components.

为了将此方法应用于线性时不变(LTI)系统,我们必须通过找出系统对输入信号各分量的影响来表征该系统。对于机械系统,有一种简单有效的方法:踢一下它,然后记录输出。

In order to apply this process to an LTI system, we have to characterize the system by finding its effect on each component of the input signal. For mechanical systems, it turns out that there is a simple and efficient way to do that: you kick it and record the output.

从技术上讲,这种“冲击”被称为脉冲,而输出被称为脉冲响​​应。你可能会好奇,单个脉冲如何能完整地表征一个系统。你可以通过计算脉冲的离散傅里叶变换 (DFT) 来找到答案。以下是一个脉冲位于 处的波阵列

Technically, the “kick” is called an impulse and the output is called the impulse response. You might wonder how a single impulse can completely characterize a system. You can see the answer by computing the DFT of an impulse. Here’s a wave array with an impulse at :

impulse = np.zeros(8)
impulse[0] = 1
impulse_spectrum = np.fft.fft(impulse)
impulse = np.zeros(8)
impulse[0] = 1
impulse_spectrum = np.fft.fft(impulse)

这是波形阵列:

Here’s the wave array:

[ 1. 0. 0. 0. 0. 0. 0. 0.]
[ 1.  0.  0.  0.  0.  0.  0.  0.]

以下是它的光谱:

And here’s its spectrum:

[ 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]
[ 1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j  1.+0.j]

该频谱全为1;也就是说,一个脉冲是由所有频率上幅度相等的分量叠加而成的。这种频谱不应与白噪声混淆,白噪声在所有频率上的平均功率相同,但其值围绕该平均值波动。

The spectrum is all ones; that is, an impulse is the sum of components with equal magnitudes at all frequencies. This spectrum should not be confused with white noise, which has the same average power at all frequencies, but varies around that average.

当你通过输入脉冲信号来测试系统时,你实际上是在测试系统在所有频率下的响应。而且你可以同时测试所有频率,因为系统是线性的,所以同时进行的测试不会相互干扰。

When you test a system by inputting an impulse, you are testing the response of the system at all frequencies. And you can test them all at the same time because the system is linear, so simultaneous tests don’t interfere with each other.

窗口和过滤器

Windows and Filters

为了说明这种系统表征方法的有效性,我将从一个简单的例子开始:双元素移动平均。我们可以将这种运算视为一个系统,它以一个信号作为输入,并产生一个略微平滑的信号作为输出。

To show why this kind of system characterization works, I will start with a simple example: a 2-element moving average. We can think of this operation as a system that takes a signal as an input and produces a slightly smoother signal as an output.

在这个例子中,我们知道窗口大小,所以可以计算出相应的滤波器。但通常情况并非如此;下一节我们将看一个事先不知道窗口大小和滤波器大小的例子。

In this example we know what the window is, so we can compute the corresponding filter. But that’s not usually the case; in the next section we’ll look at an example where we don’t know the window or the filter ahead of time.

这里有一个计算双元素移动平均值的窗口(参见“平滑”):

Here’s a window that computes a 2-element moving average (see “Smoothing”):

window_array = np.array([0.5, 0.5, 0, 0, 0, 0, 0, 0,])
window = thinkdsp.Wave(window_array, framerate=8)
window_array = np.array([0.5, 0.5, 0, 0, 0, 0, 0, 0,])
window = thinkdsp.Wave(window_array, framerate=8)

我们可以通过计算窗口的DFT来找到相应的滤波器:

We can find the corresponding filter by computing the DFT of the window:

filtr = window.make_spectrum(full=True)
filtr = window.make_spectrum(full=True)

图 10-1显示了结果。与移动平均窗口对应的滤波器是一个近似高斯曲线形状的低通滤波器。

Figure 10-1 shows the result. The filter that corresponds to a moving average window is a low-pass filter with the approximate shape of a Gaussian curve.

图 10-1. 2 元素移动平均窗口的 DFT。

现在假设我们不知道窗口函数或相应的滤波器,而我们想要表征这个系统。我们可以通过输入一个脉冲并测量脉冲响应来实现这一点。

Now imagine that we did not know the window or the corresponding filter, and we wanted to characterize this system. We would do that by inputting an impulse and measuring the impulse response.

在这个例子中,我们可以通过将脉冲的频谱与滤波器相乘来计算脉冲响应,然后将结果从频谱转换为波形:

In this example, we can compute the impulse response by multiplying the spectrum of the impulse and the filter, and then converting the result from a spectrum to a wave:

产物 = 脉冲频谱 * 滤波器
过滤后的 = product.make_wave()
product = impulse_spectrum * filtr
filtered = product.make_wave()

由于impulse_spectrum所有元素均为 1,因此该乘积与滤波器相同,而滤波后的波形与窗口相同。

Since impulse_spectrum is all ones, the product is identical to the filter, and the filtered wave is identical to the window.

这个例子说明了两件事:

This example demonstrates two things:

  • 因为脉冲的频谱全为 1,所以脉冲响应的 DFT 与表征该系统的滤波器相同。

  • Because the spectrum of an impulse is all ones, the DFT of the impulse response is identical to the filter that characterizes the system.

  • 因此,脉冲响应与表征该系统的卷积窗口相同。

  • Therefore, the impulse response is identical to the convolution window that characterizes the system.

声学响应

Acoustic Response

为了描述房间或开放空间的声学响应,一种简单的产生脉冲的方法是戳破气球或开枪。这样得到的输入信号近似于脉冲,因此你听到的声音也近似于脉冲响应。

To characterize the acoustic response of a room or open space, a simple way to generate an impulse is to pop a balloon or fire a gun. The result is an input signal that approximates an impulse, so the sound you hear approximates the impulse response.

例如,我将使用一段枪声录音来描述开枪房间的特征,然后使用脉冲响应来模拟该房间对小提琴录音的影响。

As an example, I’ll use a recording of a gunshot to characterize the room where the gun was fired, then use the impulse response to simulate the effect of that room on a violin recording.

此示例位于本书的存储库中;您也可以在http://tinyurl.com/thinkdsp10chap10.ipynb查看并收听示例。

This example is in chap10.ipynb, which is in the repository for this book; you can also view it, and listen to the examples, at http://tinyurl.com/thinkdsp10.

这是枪声:

Here’s the gunshot:

response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')
response = response.segment(start=0.26, duration=5.0)
response.normalize()
response.plot()
response = thinkdsp.read_wave('180961__kleeb__gunshots.wav')
response = response.segment(start=0.26, duration=5.0)
response.normalize()
response.plot()

我选择从 0.26 秒开始的片段,去除枪声之前的静音部分。图 10-2(左)显示了枪声的波形。接下来,我计算以下波形的离散傅里叶变换 (DFT) response

I select a segment starting at 0.26 seconds to remove the silence before the gunshot. Figure 10-2 (left) shows the waveform of the gunshot. Next I compute the DFT of response:

传输 = response.make_spectrum()
transfer.plot()
transfer = response.make_spectrum()
transfer.plot()
图 10-2.枪声波形。

图 10-2(右)显示了结果。该频谱编码了房间的响应;对于每个频率,频谱包含一个复数,分别表示幅度倍增器和相移。该频谱被称为传递函数,因为它包含了系统如何将输入传递到输出的信息。

Figure 10-2 (right) shows the result. This spectrum encodes the response of the room; for each frequency, the spectrum contains a complex number that represents an amplitude multiplier and a phase shift. This spectrum is called a transfer function because it contains information about how the system transfers the input to the output.

现在我们可以模拟这个房间对小提琴音色的影响。这是我们在“周期信号”课程中使用的小提琴录音。

Now we can simulate the effect this room would have on the sound of a violin. Here is the violin recording we used in “Periodic Signals”:

violin = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')
violin.truncate(len(response))
小提琴.normalize()
violin = thinkdsp.read_wave('92002__jcveliz__violin-origional.wav')
violin.truncate(len(response))
violin.normalize()

小提琴声和枪声的采样帧率相同,均为 44,100 Hz。巧合的是,两者的持续时间也大致相同。我将小提琴声裁剪到与枪声相同的长度。

The violin and gunshot waves were sampled at the same frame rate, 44,100 Hz. And coincidentally, the duration of both is about the same. I trimmed the violin wave to the same length as the gunshot.

接下来,我计算小提琴波形的离散傅里叶变换:

Next I compute the DFT of the violin wave:

spectrum = violin.make_spectrum()
spectrum = violin.make_spectrum()

现在我知道了输入信号中每个频率分量的幅值和相位,也知道了系统的传递函数。它们的乘积就是输出信号的离散傅里叶变换(DFT),我们可以用它来计算输出波形:

Now I know the magnitude and phase of each frequency component in the input, and I know the transfer function of the system. Their product is the DFT of the output, which we can use to compute the output wave:

输出 = (频谱 * 传输).make_wave()
output.normalize()
output.plot()
output = (spectrum * transfer).make_wave()
output.normalize()
output.plot()

图 10-3显示了系统的输入(上)和输出(下)。它们之间存在显著差异,而且这些差异在听觉上非常明显。请加载chap10.ipynb并聆听。我发现这个例子最引人注目的一点是,你可以感受到房间的声学特性;在我听来,它就像一个狭长的房间,地板和天花板都很硬。也就是说,就像一个射击场。

Figure 10-3 shows the input (top) and output (bottom) of the system. They are substantially different, and the differences are clearly audible. Load chap10.ipynb and listen to them. One thing I find striking about this example is that you can get a sense of what the room is like; to me, it sounds like a long, narrow room with hard floors and ceilings. That is, like a firing range.

图 10-3.卷积前后小提琴录音的波形。

在这个例子中,我忽略了一个细节,为了避免有人注意到,我在这里提一下。我最初使用的小提琴录音已经被一个系统(即录音房间)转换过了。所以,我实际计算的是经过两次转换后的小提琴声音。为了更准确地模拟小提琴在不同房间里的声音,我应该先对录音房间进行特性分析,然后应用该传递函数的逆函数。

There’s one thing I glossed over in this example that I’ll mention in case it bothers anyone. The violin recording I started with has already been transformed by one system: the room where it was recorded. So what I really computed in my example is the sound of the violin after two transformations. To properly simulate the sound of a violin in a different room, I should have characterized the room where the violin was recorded and applied the inverse of that transfer function first.

系统与卷积

Systems and Convolution

如果你觉得前面的例子是黑魔法,你并不孤单。我思考这个问题很久了,至今仍然觉得头疼。

If you think the previous example is black magic, you are not alone. I’ve been thinking about it for a while and it still makes my head hurt.

在前一节中,我提出了一种思考方式:

In the previous section, I suggested one way to think about it:

  • 脉冲由所有频率上振幅均为 1 的分量组成。

  • An impulse is made up of components with amplitude 1 at all frequencies.

  • 脉冲响应包含系统对所有这些分量的响应之和。

  • The impulse response contains the sum of the responses of the system to all of these components.

  • 传递函数(即脉冲响应的 DFT)以幅度倍增器和相移的形式编码了系统对每个频率分量的影响。

  • The transfer function, which is the DFT of the impulse response, encodes the effect of the system on each frequency component in the form of an amplitude multiplier and a phase shift.

  • 对于任何输入,我们可以通过将输入分解成各个组成部分,计算每个组成部分的响应,然后将它们相加来计算系统的响应。

  • For any input, we can compute the response of the system by breaking the input into components, computing the response to each component, and adding them up.

但如果你不喜欢这种方式,还有另一种完全不同的思考方式:卷积!根据卷积定理,频域中的乘法对应于时域中的卷积。在这个例子中,系统的输出是输入和系统响应的卷积。

But if you don’t like that, there’s another way to think about it altogether: convolution! By the Convolution Theorem, multiplication in the frequency domain corresponds to convolution in the time domain. In this example, the output of the system is the convolution of the input and the system response.

以下是理解其原理的关键:

Here are the keys to understanding why that works:

  • 你可以把输入波形中的样本看作是一系列振幅变化的脉冲。

  • You can think of the samples in the input wave as a sequence of impulses with varying amplitude.

  • 输入中的每个脉冲都会产生一个脉冲响应的副本,该副本在时间上发生偏移(因为系统是时不变的),并按输入的幅度进行缩放。

  • Each impulse in the input yields a copy of the impulse response, shifted in time (because the system is time-invariant) and scaled by the amplitude of the input.

  • 输出是脉冲响应的平移和缩放副本之和。这些副本相加是因为系统是线性的。

  • The output is the sum of the shifted, scaled copies of the impulse response. The copies add up because the system is linear.

让我们循序渐进地进行。假设我们不发射一门枪,而是发射两门:一门是振幅为 1 的大枪,另一门是振幅为 0.5 的小枪

Let’s work our way up gradually. Suppose that instead of firing one gun, we fire two: a big one with amplitude 1 at and a smaller one with amplitude 0.5 at .

我们可以通过将原始脉冲响应与其缩放和平移后的副本相加来计算系统的响应。以下是一个生成波形平移缩放副本的函数:

We can compute the response of the system by adding up the original impulse response and a scaled, shifted copy of itself. Here’s a function that makes a shifted, scaled copy of a wave:

def shifted_scaled(wave, shift, factor):
    res = wave.copy()
    res.shift(shift)
    res.scale(因子)
    返回结果
def shifted_scaled(wave, shift, factor):
    res = wave.copy()
    res.shift(shift)
    res.scale(factor)
    return res

该参数shift为时间偏移量(以秒为单位);factor是一个乘法因子。

The parameter shift is a time shift in seconds; factor is a multiplicative factor.

以下是我们如何使用它来计算对两响礼炮的回应:

Here’s how we use it to compute the response to a two-gun salute:

偏移量 = 1
系数 = 0.5
gun2 = response + shifted_scaled(response, shift, factor)
shift = 1
factor = 0.5
gun2 = response + shifted_scaled(response, shift, factor)

图 10-4显示了结果。你可以听听它的声音chap10.ipynb。不出所料,听起来像是两声枪响,第一声比第二声响。

Figure 10-4 shows the result. You can hear what it sounds like in chap10.ipynb. Not surprisingly, it sounds like two gunshots, the first one louder than the second.

图 10-4.波形与平移、缩放副本之和。

现在假设你不是用两支枪,而是用一百支枪,以每秒 441 发的速率进行射击。这个循环计算结果:

Now suppose instead of 2 guns, you were to add up 100 guns fired at a rate of 441 shots per second. This loop computes the result:

dt = 1 / 441
总计 = 0
for k in range(100):
    total += shifted_scaled(response, k*dt, 1.0)
dt = 1 / 441
total = 0
for k in range(100):
    total += shifted_scaled(response, k*dt, 1.0)

每秒发射 441 次,你听不到每次射击的声音。取而代之的是,它听起来像是一个频率为 441 赫兹的周期性信号。如果你播放这个例子,它听起来就像车库里的汽车喇叭声。

With 441 shots per second, you don’t hear the individual shots. Instead, it sounds like a periodic signal at 441 Hz. If you play this example, it sounds like a car horn in a garage.

这就引出了一个关键见解:你可以把任何波看作一系列样本,其中每个样本都是一个振幅不同的脉冲。

And that brings us to a key insight: you can think of any wave as a series of samples, where each sample is an impulse with a different amplitude.

例如,我将生成一个频率为 441 Hz 的锯齿波信号:

As an example, I’ll generate a sawtooth signal at 441 Hz:

signal = thinkdsp.SawtoothSignal(freq=441)
wave = signal.make_wave(duration=0.1,
                        帧率=响应.帧率)
signal = thinkdsp.SawtoothSignal(freq=441)
wave = signal.make_wave(duration=0.1,
                        framerate=response.framerate)

现在我将遍历构成锯齿波的一系列脉冲,并将脉冲响应相加:

Now I’ll loop through the series of impulses that make up the sawtooth, and add up the impulse responses:

总计 = 0
for t, y in zip(wave.ts, wave.ys):
    total += shifted_scaled(response, t, y)
total = 0
for t, y in zip(wave.ts, wave.ys):
    total += shifted_scaled(response, t, y)

结果听起来就像在射击场上播放锯齿波一样。你可以再次收听chap10.ipynb

The result is what it would sound like to play a sawtooth wave in a firing range. Again, you can listen to it in chap10.ipynb.

图 10-5显示了此计算的示意图,其中f是锯齿波,g是脉冲响应,h是g的移位、缩放副本之和。

Figure 10-5 shows a diagram of this computation, where f is the sawtooth, g is the impulse response, and h is the sum of the shifted, scaled copies of g.

图 10-5. g 的缩放和平移副本之和的示意图。

例如:

For the example shown:

更广泛地说:

And more generally:

你可能在“卷积”中见过这个等式。它是fg的卷积。这表明,如果输入是f,系统的脉冲响应是g ,则输出是fg的卷积。

You might recognize this equation from “Convolution”. It is the convolution of f and g. This shows that if the input is f and the impulse response of the system is g, the output is the convolution of f and g.

总而言之,有两种方式可以思考系统对信号的影响:

In summary, there are two ways to think about the effect of a system on a signal:

  1. 输入是一系列脉冲,因此输出是脉冲响应的缩放、移位副本之和;该和是输入和脉冲响应的卷积。

  2. The input is a sequence of impulses, so the output is the sum of scaled, shifted copies of the impulse response; that sum is the convolution of the input and the impulse response.

  3. 脉冲响应的离散傅里叶变换 (DFT) 是一个传递函数,它将系统对每个频率分量的影响编码为幅值和相位偏移。输入信号的 DFT 编码了其包含的频率分量的幅值和相位偏移。将输入信号的 DFT 与传递函数相乘即可得到输出信号的 DFT。

  4. The DFT of the impulse response is a transfer function that encodes the effect of the system on each frequency component as a magnitude and phase offset. The DFT of the input encodes the magnitude and phase offset of the frequency components it contains. Multiplying the DFT of the input by the transfer function yields the DFT of the output.

这些描述的等价性并不令人意外。它本质上是对卷积定理的表述:时域中函数fg的卷积对应于频域中的乘法。

The equivalence of these descriptions should not be a surprise. It is basically a statement of the Convolution Theorem: convolution of f and g in the time domain corresponds to multiplication in the frequency domain.

如果你想知道为什么卷积的定义是这样的(这在我们讨论平滑和差分窗时似乎有点反常),现在你知道原因了:卷积的定义自然地出现在 LTI 系统对信号的响应中。

And if you wondered why convolution is defined as it is, which seemed backward when we talked about smoothing and difference windows, now you know the reason: the definition of convolution appears naturally in the response of an LTI system to a signal.

卷积定理的证明

Proof of the Convolution Theorem

好了,我已经拖延够久了。是时候证明卷积定理(CT)了,该定理指出:

Well, I’ve put it off long enough. It’s time to prove the Convolution Theorem (CT), which states:

其中fg是长度相同的向量,N

where f and g are vectors with the same length, N.

我将分两步进行:

I’ll proceed in two steps:

  1. 我将证明,在f为复指数的特殊情况下,与g进行卷积的效果是将f乘以一个标量。

  2. I’ll show that in the special case where f is a complex exponential, convolution with g has the effect of multiplying f by a scalar.

  3. 在更一般的情况下,如果f不是复指数函数,我们可以使用 DFT 将其表示为指数分量之和,计算每个分量的卷积(通过乘法),然后将结果相加。

  4. In the more general case where f is not a complex exponential, we can use the DFT to express it as a sum of exponential components, compute the convolution of each component (by multiplication), and then add up the results.

这些步骤共同证明了卷积定理。但首先,让我们把需要的材料组装起来。函数g的离散傅里叶变换(我称之为G)为:

Together these steps prove the Convolution Theorem. But first, let’s assemble the pieces we’ll need. The DFT of g, which I’ll call G, is:

其中k为频率索引,取值范围为 0 到n n为时间索引,取值范围为 0 到n。结果是一个包含N 个复数的向量。

where k is an index of frequency from 0 to and n is an index of time from 0 to . The result is a vector of N complex numbers.

F的逆 DFT(我将其记为f)为:

The inverse DFT of F, which I’ll call f, is:

以下是卷积的定义:

Here’s the definition of convolution:

其中m是另一个时间索引,取值范围为 0 到。卷积运算满足交换律,因此我可以等价地写成:

where m is another index of time from 0 to . Convolution is commutative, so I could equivalently write:

现在让我们考虑f为频率为k 的复指数函数的特殊情况,我将其称为e k

Now let’s consider the special case where f is a complex exponential with frequency k, which I’ll call ek:

其中k为频率索引,n为时间索引。

where k is an index of frequency and n is an index of time.

e k代入卷积的第二个定义,得到:

Plugging ek into the second definition of convolution yields:

我们可以将第一个术语拆分成一个乘积:

We can split the first term into a product:

前半部分与m无关,因此我们可以将其从求和式中分离出来:

The first half does not depend on m, so we can pull it out of the summation:

现在我们知道第一项是e k,求和式为(用m作为时间索引)。因此我们可以写成:

Now we recognize that the first term is ek, and the summation is (using m as the index of time). So we can write:

这表明,对于每个复指数e k,与g的卷积相当于将e k乘以。用数学术语来说,每个e k都是此运算的特征向量,而是相应的特征值(参见“微分”)。

which shows that for each complex exponential, ek, convolution with g has the effect of multiplying ek by . In mathematical terms, each ek is an eigenvector of this operation, and is the corresponding eigenvalue (see “Differentiation”).

现在来看证明的第二部分。如果输入信号f不是复指数函数,我们可以通过计算其 DFT F将其表示为复指数函数的和。对于从 0 到 的每个k,是频率为k的分量的复幅度。

Now for the second part of the proof. If the input signal, f, doesn’t happen to be a complex exponential, we can express it as a sum of complex exponentials by computing its DFT, F. For each value of k from 0 to , is the complex magnitude of the component with frequency k.

根据证明的第一部分,每个输入分量都是幅度为的复指数,因此每个输出分量都是幅度为的复指数。

Each input component is a complex exponential with magnitude , so each output component is a complex exponential with magnitude , based on the first part of the proof.

由于该系统是线性的,因此输出就是各个输出分量的总和:

Because the system is linear, the output is just the sum of the output components:

代入e k的定义式,得到:

Plugging in the definition of ek yields:

等式右侧是乘积的逆密度泛函理论(DFT) 。因此:

The righthand side is the inverse DFT of the product . Thus:

替换

Substituting and :

最后,对等式两边进行离散傅里叶变换,即可得到卷积定理:

Finally, taking the DFT of both sides yields the Convolution Theorem:

证毕。

QED.

练习

Exercises

这些练习的答案在chap10soln.ipynb……

Solutions to these exercises are in chap10soln.ipynb.

练习 10-1。

《系统与卷积》一书中,我将卷积描述为信号的移位、缩放副本之和。

In “Systems and Convolution” I describe convolution as the sum of shifted, scaled copies of a signal.

但在“声学响应”中,当我们把信号的离散傅里叶变换(DFT)乘以传递函数时,该运算对应于循环卷积,而循环卷积假设信号是周期性的。因此,您可能会注意到输出结果开头多了一个音符,这个音符会从结尾处循环回来。

But in “Acoustic Response”, when we multiply the DFT of the signal by the transfer function, that operation corresponds to circular convolution, which assumes that the signal is periodic. As a result, you might notice that the output contains an extra note at the beginning, which wraps around from the end.

幸运的是,这个问题有一个标准的解决方案。在计算离散傅里叶变换(DFT)之前,如果在信号末尾添加足够的零,就可以避免信号缠绕效应。

Fortunately, there is a standard solution to this problem. If you add enough zeros to the end of the signal before computing the DFT, you can avoid the wraparound effect.

修改示例中的内容chap10.ipynb,并确认零填充可以消除输出开头多余的音符。

Modify the example in chap10.ipynb and confirm that zero-padding eliminates the extra note at the beginning of the output.

练习 10-2。

Open AIR 库为所有对听觉化和声学脉冲响应数据感兴趣的人提供了一个“集中式的在线资源”(http://www.openairlib.net)。浏览其脉冲响应数据集合,下载一个听起来有趣的数据。找到一段采样率与你下载的脉冲响应数据相同的短录音。

The Open AIR library provides a “centralized... on-line resource for anyone interested in auralization and acoustical impulse response data” (http://www.openairlib.net). Browse its collection of impulse response data and download one that sounds interesting. Find a short recording that has the same sample rate as the impulse response you downloaded.

在测量脉冲响应的空间中模拟录音的声音,计算方法有两种:一种是将录音与脉冲响应进行卷积,另一种是计算与脉冲响应对应的滤波器,然后乘以录音的 DFT。

Simulate the sound of your recording in the space where the impulse response was measured, computed two ways: by convolving the recording with the impulse response and by computing the filter that corresponds to the impulse response and multiplying by the DFT of the recording.

第十一章调制与采样

Chapter 11. Modulation and Sampling

“混叠”一节中,我们看到,当信号以 10,000 Hz 的采样率采样时,5500 Hz 的分量与 4500 Hz 的分量无法区分。在这个例子中,折叠频率 5000 Hz 是采样率的一半。但我没有解释其中的原因。

In “Aliasing” we saw that when a signal is sampled at 10,000 Hz, a component at 5500 Hz is indistinguishable from a component at 4500 Hz. In this example, the folding frequency, 5000 Hz, is half of the sampling rate. But I didn’t explain why.

本章探讨了采样的影响,并提出了采样定理,该定理解释了混叠和折叠频率。

This chapter explores the effect of sampling and presents the Sampling Theorem, which explains aliasing and the folding frequency.

我将首先探讨脉冲卷积的影响;我将利用这种影响来解释幅度调制(AM),这对于理解采样定理非常有用。

I’ll start by exploring the effect of convolution with impulses; I’ll use that effect to explain amplitude modulation (AM), which turns out to be useful for understanding the Sampling Theorem.

本章的代码位于chap11.ipynb本书的代码库中(参见“使用代码” )。您也可以在http://tinyurl.com/thinkdsp-ch11查看。

The code for this chapter is in chap11.ipynb, which is in the repository for this book (see “Using the Code”). You can also view it at http://tinyurl.com/thinkdsp-ch11.

脉冲卷积

Convolution with Impulses

正如我们在“系统与卷积”中看到的那样,将一个信号与一系列脉冲进行卷积,其效果是将该信号的移位、缩放副本相加。

As we saw in “Systems and Convolution”, convolution of a signal with a series of impulses has the effect of adding up shifted, scaled copies of the signal.

举例来说,我会读取一个听起来像哔哔声的信号:

As an example, I’ll read a signal that sounds like a beep:

文件名 = '253887__themusicalnomad__positive-beeps.wav'
wave = thinkdsp.read_wave(filename)
wave.normalize()
filename = '253887__themusicalnomad__positive-beeps.wav'
wave = thinkdsp.read_wave(filename)
wave.normalize()

我将构建一个包含四个脉冲的波:

And I’ll construct a wave with four impulses:

imp_sig = thinkdsp.Impulses([0.005, 0.3, 0.6, 0.9],
                       amps=[1, 0.5, 0.25, 0.1])
impulses = imp_sig.make_wave(start=0, duration=1.0,
                             帧率=wave.帧率)
imp_sig = thinkdsp.Impulses([0.005, 0.3, 0.6,  0.9], 
                       amps=[1,     0.5, 0.25, 0.1])
impulses = imp_sig.make_wave(start=0, duration=1.0, 
                             framerate=wave.framerate)

然后将它们进行卷积:

and then convolve them:

卷积 = wave.convolve(脉冲)
convolved = wave.convolve(impulses)

图 11-1显示了结果,左上角为信号,左下角为脉冲,右侧为结果。

Figure 11-1 shows the results, with the signal in the top left, the impulses in the lower left, and the result on the right.

图 11-1.将一个信号(左上)与一系列脉冲(左下)进行卷积的效果。结果(右)是信号的平移和缩放副本之和。

你可以听听结果chap11.ipynb;听起来像是一连串音量逐渐减小的四声哔哔声。

You can hear the result in chap11.ipynb; it sounds like a series of four beeps with decreasing loudness.

这个例子的目的在于说明脉冲卷积会生成平移缩放后的副本。这个结果在下一节中会很有用。

The point of this example is just to demonstrate that convolution with impulses makes shifted, scaled copies. This result will be useful in the next section.

调幅

Amplitude Modulation

幅度调制(AM)技术用于广播调幅广播等多种用途。在发射机端,信号(可能包含语音、音乐等)通过与余弦信号(作为“载波”)相乘进行“调制”。调制后的信号为适合无线电广播的高频波。美国调幅广播的典型频率范围为 500–1600 kHz(参见https://en.wikipedia.org/wiki/AM_broadcasting)。

Amplitude modulation (AM) is used to broadcast AM radio, among other applications. At the transmitter, the signal (which might contain speech, music, etc.) is “modulated” by multiplying it with a cosine signal that acts as a “carrier wave”. The result is a high-frequency wave that is suitable for broadcast by radio. Typical frequencies for AM radio in the United States are 500–1600 kHz (see https://en.wikipedia.org/wiki/AM_broadcasting).

在接收端,广播信号被“解调”以恢复原始信号。令人惊讶的是,解调的工作原理是将广播信号再次乘以同一个载波。

At the receiving end, the broadcast signal is “demodulated” to recover the original signal. Surprisingly, demodulation works by multiplying the broadcast signal, again, by the same carrier wave.

为了了解其工作原理,我将用一个频率为 10 kHz 的载波来调制一个信号。信号如下:

To see how that works, I’ll modulate a signal with a carrier wave at 10 kHz. Here’s the signal:

文件名 = '105977__wcfl10__favorite-station.wav'
wave = thinkdsp.read_wave(filename)
wave.unbias()
wave.normalize()
filename = '105977__wcfl10__favorite-station.wav'
wave = thinkdsp.read_wave(filename)
wave.unbias()
wave.normalize()

这就是承运商:

And here’s the carrier:

rier_sig = thinkdsp.CosSignal(freq=10000)
carrier_wave = carrier_sig.make_wave(duration=wave.duration,
                                     帧率=wave.帧率)
carrier_sig = thinkdsp.CosSignal(freq=10000)
carrier_wave = carrier_sig.make_wave(duration=wave.duration, 
                                     framerate=wave.framerate)

我们可以使用运算符将​​它们相乘,该运算符逐元素地*乘以波数组:

We can multiply them using the * operator, which multiplies the wave arrays elementwise:

调制波 = 波 * 载波
modulated = wave * carrier_wave

结果听起来很糟糕。你可以听听chap11.ipynb

The result sounds pretty bad. You can hear it in chap11.ipynb.

图 11-2显示了频域中的情况。第一行是原始信号的频谱。第二行是乘以载波后的调制信号的频谱。它包含原始频谱的两个副本,分别偏移了正负 10 kHz。

Figure 11-2 shows what’s happening in the frequency domain. The top row is the spectrum of the original signal. The next row is the spectrum of the modulated signal, after multiplying by the carrier. It contains two copies of the original spectrum, shifted by plus and minus 10 kHz.

为了理解这一点,需要记住,时域中的卷积对应于频域中的乘法。反之亦然,时域中的乘法对应于频域中的卷积。当我们将信号乘以载波时,实际上是将信号的频谱与载波的离散傅里叶变换(DFT)进行卷积。

To understand why, recall that convolution in the time domain corresponds to multiplication in the frequency domain. Conversely, multiplication in the time domain corresponds to convolution in the frequency domain. When we multiply the signal by the carrier, we are convolving its spectrum with the DFT of the carrier.

由于载波是一个简单的余弦波,其离散傅里叶变换 (DFT) 为两个脉冲,频率分别为±10 kHz。与这两个脉冲进行卷积运算,即可得到频谱的移位和缩放副本。请注意,调制后频谱的幅度变小了。原始信号的能量被分配到这两个副本中。

Since the carrier is a simple cosine wave, its DFT is two impulses, at plus and minus 10 kHz. Convolution with these impulses makes shifted, scaled copies of the spectrum. Notice that the amplitude of the spectrum is smaller after modulation. The energy from the original signal is split between the copies.

我们再次乘以载波来解调信号:

We demodulate the signal by multiplying by the carrier wave again:

解调后的值 = 调制后的值 * 载波
demodulated = modulated * carrier_wave

图 11-2的第三行显示了结果。同样,时域中的乘法对应于频域中的卷积,从而生成频谱的平移和缩放副本。

The third row of Figure 11-2 shows the result. Again, multiplication in the time domain corresponds to convolution in the frequency domain, which makes shifted, scaled copies of the spectrum.

图 11-2.幅度调制演示。第一行是信号的频谱;第二行是调制后的频谱;第三行是解调后的频谱;最后一行是经过低通滤波后的解调信号。

由于调制后的频谱包含两个峰值,每个峰值都被分成两半,并分别向正负 20 kHz 方向移动。其中两份副本在 0 kHz 处相遇并叠加;另外两份副本最终分别以正负 20 kHz 为中心。

Since the modulated spectrum contains two peaks, each peak gets split in half and shifted by plus and minus 20 kHz. Two of the copies meet at 0 kHz and get added together; the other two copies end up centered at plus and minus 20 kHz.

如果你听一下解调后的信号,会发现它听起来相当不错。额外的频谱副本增加了原始信号中没有的高频成分。这些高频成分非常高,大多数扬声器无法播放,大多数人也听不到,但如果你拥有好的扬声器和灵敏的耳朵,或许就能听到。

If you listen to the demodulated signal, it sounds pretty good. The extra copies of the spectrum add high-frequency components that were not in the original signal. These are so high that most speakers can’t play them and most people can’t hear them, but if you have good speakers and good ears, you might.

在这种情况下,您可以通过应用低通滤波器来去除多余的成分:

In that case, you can get rid of the extra components by applying a low-pass filter:

解调频谱 = 解调.生成频谱(full=True)
解调频谱.低通(10000)
过滤后的频谱 = 解调频谱.生成波形()
demodulated_spectrum = demodulated.make_spectrum(full=True)
demodulated_spectrum.low_pass(10000)
filtered = demodulated_spectrum.make_wave()

结果与原始波形非常接近,尽管解调和滤波后损失了大约一半的功率。这在实际应用中不是问题,因为广播信号的发射和接收过程中会损失更多的功率。既然无论如何我们都需要放大结果,那么再放大两倍也无关紧要。

The result is quite close to the original wave, although about half of the power is lost after demodulating and filtering. That’s not a problem in practice, because much more of the power is lost in transmitting and receiving the broadcast signal. Since we have to amplify the result anyway, another factor of 2 is not an issue.

采样

Sampling

我解释幅度调制,一部分原因是它很有趣,但更主要的原因是它有助于我们理解采样。采样是指在一系列时间点(通常间隔相等)测量模拟信号的过程。

I explained amplitude modulation in part because it is interesting, but mostly because it will help us understand sampling. Sampling is the process of measuring an analog signal at a series of points in time, usually with equal spacing.

例如,我们用作示例的 WAV 文件是通过使用模数转换器 (ADC) 对麦克风输出进行采样而录制的。它们大多采用 44.1 kHz 的采样率,这是“CD 音质”的标准采样率;或者采用 48 kHz 的采样率,这是 DVD 音质的标准采样率。

For example, the WAV files we have used as examples were recorded by sampling the output of a microphone using an analog-to-digital converter (ADC). The sampling rate for most of them is 44.1 kHz, which is the standard rate for “CD-quality” sound, or 48 kHz, which is the standard for DVD sound.

在 48 kHz 时,折叠频率为 24 kHz,这比大多数人能听到的频率要高(参见https://en.wikipedia.org/wiki/Hearing_range)。

At 48 kHz, the folding frequency is 24 kHz, which is higher than most people can hear (see https://en.wikipedia.org/wiki/Hearing_range).

在这些波形中,每个样本有 16 位,因此有 2^ 16个不同的级别。这种“位深度”已经足够,增加更多位数并不会明显改善音质(参见https://en.wikipedia.org/wiki/Digital_audio)。

In most of these waves, each sample has 16 bits, so there are 216 distinct levels. This “bit depth” turns out to be enough that adding more bits does not improve the sound quality noticeably (see https://en.wikipedia.org/wiki/Digital_audio).

当然,除了音频信号之外的其他应用可能需要更高的采样率来捕捉更高的频率,或者更高的位深度来更保真地再现波形。

Of course, applications other than audio signals might require higher sampling rates in order to capture higher frequencies, or higher bit depth in order to reproduce waveforms with more fidelity.

为了演示采样过程的效果,我将从一个以 44.1 kHz 采样率采集的波形开始,并从中选取大约 11 kHz 的样本。这与从模拟信号中采样并不完全相同,但效果是一样的。

To demonstrate the effect of the sampling process, I am going to start with a wave sampled at 44.1 kHz and select samples from it at about 11 kHz. This is not exactly the same as sampling from an analog signal, but the effect is the same.

首先,我将加载一段鼓独奏的录音:

First I’ll load a recording of a drum solo:

文件名 = '263868__kevcio__amen-break-a-160-bpm.wav'
wave = thinkdsp.read_wave(filename)
wave.normalize()
filename = '263868__kevcio__amen-break-a-160-bpm.wav'
wave = thinkdsp.read_wave(filename)
wave.normalize()

图 11-3(上)显示了该波的频谱。下面是对该波进行采样的函数:

Figure 11-3 (top) shows the spectrum of this wave. Now here’s the function that samples from the wave:

def sample(wave, factor=4):
    ys = np.zeros(len(wave))
    ys[::factor] = wave.ys[::factor]
    返回 thinkdsp.Wave(ys, framerate=wave.framerate)
def sample(wave, factor=4):
    ys = np.zeros(len(wave))
    ys[::factor] = wave.ys[::factor]
    return thinkdsp.Wave(ys, framerate=wave.framerate)

我将用它来选择每四个元素中的一个:

I’ll use it to select every fourth element:

sampled = sample(wave, 4)
sampled = sample(wave, 4)

结果帧速率与原信号相同,但大部分元素的值都为零。如果播放采样后的波形,听起来不太好。采样过程引入了原信号中原本不存在的高频成分。

The result has the same frame rate as the original, but most of the elements are zero. If you play the sampled wave, it doesn’t sound very good. The sampling process introduces high-frequency components that were not in the original.

图 11-3(底部)显示了采样波的频谱。它包含原始频谱的四个副本(看起来像五个副本,因为其中一个副本被分割在最高频率和最低频率之间)。

Figure 11-3 (bottom) shows the spectrum of the sampled wave. It contains four copies of the original spectrum (it looks like five copies because one is split between the highest and lowest frequencies).

图 11-3.采样前(上)和采样后(下)信号的频谱。

为了理解这些副本的来源,我们可以将采样过程视为与一系列脉冲的乘法运算。与其使用 `push`sample函数选择每四个元素中的一个,不如使用该函数生成一系列脉冲,有时称为脉冲序列

To understand where these copies come from, we can think of the sampling process as multiplication with a series of impulses. Instead of using sample to select every fourth element, we could use this function to make a series of impulses, sometimes called an impulse train:

def make_impulses(wave, factor):
    ys = np.zeros(len(wave))
    ys[::factor] = 1
    ts = np.arange(len(wave)) / wave.framerate
    返回 thinkdsp.Wave(ys, ts, wave.framerate)
def make_impulses(wave, factor):
    ys = np.zeros(len(wave))
    ys[::factor] = 1
    ts = np.arange(len(wave)) / wave.framerate
    return thinkdsp.Wave(ys, ts, wave.framerate)

然后将原始波形乘以脉冲序列:

And then multiply the original wave by the impulse train:

impulses = make_impulses(wave, 4)
采样值 = 波形 * 脉冲
impulses = make_impulses(wave, 4)
sampled = wave * impulses

结果相同;听起来仍然不太好,但现在我们明白了原因。时域中的乘法对应于频域中的卷积。当我们乘以一个脉冲序列时,实际上是在与该脉冲序列的离散傅里叶变换 (DFT) 进行卷积。而脉冲序列的 DFT 本身也是一个脉冲序列。

The result is the same; it still doesn’t sound very good, but now we understand why. Multiplication in the time domain corresponds to convolution in the frequency domain. When we multiply by an impulse train, we are convolving with the DFT of an impulse train. As it turns out, the DFT of an impulse train is also an impulse train.

图 11-4展示了两个示例。上排是示例中的频率为 11025 Hz 的脉冲序列。其离散傅里叶变换 (DFT) 由四个脉冲组成,因此我们得到四份频谱图。下排显示的是一个频率较低的脉冲序列,约为 5512 Hz。它的 DFT 由八个脉冲组成。一般来说,时域中的脉冲越多,频域中的脉冲就越少。

Figure 11-4 shows two examples. The top row is the impulse train in the example, with frequency 11,025 Hz. The DFT is a train of four impulses, which is why we get four copies of the spectrum. The bottom row shows an impulse train with a lower frequency, about 5512 Hz. Its DFT is a train of eight impulses. In general, more impulses in the time domain correspond to fewer impulses in the frequency domain.

总之:

In summary:

  • 我们可以将采样视为乘以一个脉冲序列。

  • We can think of sampling as multiplication by an impulse train.

  • 乘以脉冲序列相当于在频域中与脉冲序列进行卷积。

  • Multiplying by an impulse train corresponds to convolution with an impulse train in the frequency domain.

  • 与脉冲序列进行卷积会产生信号频谱的多个副本。

  • Convolution with an impulse train makes multiple copies of the signal’s spectrum.

图 11-4.脉冲序列的 DFT 也是脉冲序列。

别名

Aliasing

“幅度调制”中,解调调幅信号后,我们通过应用低通滤波器来去除频谱中的多余部分。采样后也可以进行同样的操作,但这并非完美的解决方案。

In “Amplitude Modulation”, after demodulating an AM signal we got rid of the extra copies of the spectrum by applying a low-pass filter. We can do the same thing after sampling, but it turns out not to be a perfect solution.

图 11-5展示了原因。顶行是鼓独奏的频谱。它包含超过 10 kHz 的高频成分。当我们对该波形进行采样时,我们将频谱与脉冲序列(第二行)进行卷积,从而得到频谱的副本(第三行)。底行显示了应用截止频率为 5512 Hz 的低通滤波器后的结果。

Figure 11-5 shows why not. The top row is the spectrum of the drum solo. It contains high-frequency components that extend past 10 kHz. When we sample this wave, we convolve the spectrum with the impulse train (second row), which makes copies of the spectrum (third row). The bottom row shows the result after applying a low-pass filter with a cutoff at the folding frequency, 5512 Hz.

图 11-5.鼓独奏的频谱(上)、脉冲序列(第二行)、采样波形(第三行)以及低通滤波后的结果(下)。

如果我们把结果转换回波形,它与原始波形相似,但存在两个问题:

If we convert the result back to a wave, it is similar to the original wave, but there are two problems:

  • 由于使用了低通滤波器,5500 Hz 以上的成分被滤掉了,所以声音听起来比较沉闷。

  • Because of the low-pass filter, the components above 5500 Hz have been lost, so the result sounds muted.

  • 即使是低于 5500 Hz 的分量也不完全正确,因为它们包含了我们试图滤除的频谱副本的贡献。

  • Even the components below 5500 Hz are not quite right, because they include contributions from the spectral copies we tried to filter out.

如果采样后光谱副本重叠,我们将丢失有关光谱的信息,并且我们将无法恢复它。

If the spectral copies overlap after sampling, we lose information about the spectrum and we won’t be able to recover it.

但如果副本之间没有重叠,效果就很好。第二个例子,我加载了一段贝斯独奏的录音。

But if the copies don’t overlap, things work out pretty well. As a second example, I loaded a recording of a bass guitar solo.

图 11-6显示了它的频谱(第一行),其中 5512 Hz 以上没有可见的能量。第二行显示了采样波的频谱,第三行显示了经过低通滤波后的频谱。振幅降低了,因为我们滤除了一部分能量,但频谱的形状几乎与初始频谱完全相同。如果我们将其转换回波形,听起来也一样。

Figure 11-6 shows its spectrum (top row), which contains no visible energy above 5512 Hz. The second row shows the spectrum of the sampled wave, and the third row shows the spectrum after the low-pass filter. The amplitude is lower because we’ve filtered out some of the energy, but the shape of the spectrum is almost exactly what we started with. And if we convert back to a wave, it sounds the same.

图 11-6.贝斯吉他独奏的频谱(上)、采样后(中)和滤波后(下)。

插值

Interpolation

我在上一步中使用的低通滤波器是所谓的砖墙滤波器;截止频率以上的频率会被完全消除,就像撞到砖墙一样。

The low-pass filter I used in the last step is a so-called brick wall filter; frequencies above the cutoff are removed completely, as if they hit a brick wall.

图 11-7(右)展示了该滤波器的结构。当然,频域中乘以该滤波器对应于时域中与某个窗函数的卷积。我们可以通过计算该滤波器的逆离散傅里叶变换(IDFT)来确定该窗函数,如图11-7(左)所示。

Figure 11-7 (right) shows what this filter looks like. Of course, multiplication by this filter in the frequency domain corresponds to convolution with a window in the time domain. We can find out what that window is by computing the inverse DFT of the filter, which is shown in Figure 11-7 (left).

图 11-7.砖墙低通滤波器(右)和相应的卷积窗口(左)。

该函数有一个名称——它是归一化sinc 函数,或者至少是它的离散近似值(参见https://en.wikipedia.org/wiki/Sinc_function):

That function has a name—it is the normalized sinc function, or at least a discrete approximation of it (see https://en.wikipedia.org/wiki/Sinc_function):

应用低通滤波器时,我们实际上是在与一个sinc函数进行卷积。我们可以把这种卷积看作是sinc函数平移和缩放后的副本之和。

When we apply the low-pass filter, we are convolving with a sinc function. We can think of this convolution as the sum of shifted, scaled copies of the sinc function.

sinc 函数在x = 0 时值为 1,在其他所有整数值处值为 0。平移 sinc 函数会移动零点。缩放 sinc 函数会改变零点的高度。因此,将平移和缩放后的 sinc 函数值相加,即可得到采样点之间的插值结果。

The value of sinc is 1 at 0 and 0 at every other integer value of x. When we shift the sinc function, we move the zero point. When we scale it, we change the height at the zero point. So when we add up the shifted, scaled copies, they interpolate between the sampled points.

图 11-8以一段贝斯独奏为例,展示了其工作原理。顶部的横线是原始波形。垂直的灰色线条表示采样值。细曲线是经过平移和缩放的 sinc 函数副本。这些 sinc 函数的总和与原始波形相同。

Figure 11-8 shows how that works using a short segment of the bass guitar solo. The line across the top is the original wave. The vertical gray lines show the sampled values. The thin curves are the shifted, scaled copies of the sinc function. The sum of these sinc functions is identical to the original wave.

图 11-8。一系列样本(垂直灰线)、插值 sinc 函数(细曲线)和原始波(顶部较粗的线)的特写视图。

我再说一遍,因为这既令人惊讶又很重要:

I’ll say that again, because it is surprising and important:

这些 sinc 函数之和与原始波相同。

The sum of these sinc functions is identical to the original wave.

因为我们最初得到的信号在 5512 Hz 以上没有能量,而我们的采样频率为 11025 Hz,所以我们能够精确地恢复原始频谱。如果我们能够精确地得到原始频谱,我们就能精确地恢复原始波形。

Because we started with a signal that contained no energy above 5512 Hz, and we sampled at 11,025 Hz, we were able to recover the original spectrum exactly. And if we have the original spectrum, exactly, we can recover the original wave exactly.

在这个例子中,我从一个已经以 44,100 Hz 采样的波形开始,然后将其重新采样到 11,025 Hz。重新采样后,频谱副本之间的间隔为 11.025 kHz。

In this example, I started with a wave that had already been sampled at 44,100 Hz, and I resampled it at 11,025 Hz. After resampling, the gap between the spectral copies is 11.025 kHz.

如果原始波在 5512 Hz 以上不包含能量,则频谱副本不会重叠,我们不会丢失信息,并且可以精确地恢复原始信号。

If the original wave contains no energy above 5512 Hz, the spectral copies don’t overlap, we don’t lose information, and we can recover the original signal exactly.

这一结果被称为奈奎斯特-香农采样定理(参见https://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem)。

This result is known as the Nyquist–Shannon Sampling Theorem (see https://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem).

这个例子并不能证明抽样定理,但我希望它能帮助你理解抽样定理的内容和原理。

This example does not prove the Sampling Theorem, but I hope it helps you understand what it says and why it works.

请注意,我的论证并不依赖于原始采样率 44.1 kHz。即使原始信号的采样频率更高,或者原始信号是连续模拟信号,结果也相同:如果我们以帧速率f进行采样,只要原始信号不包含高于 f 的频率能量,我们就能精确地恢复原始信号。这样的信号被称为带宽受限信号

Notice that the argument I made does not depend on the original sampling rate, 44.1 kHz. The result would be the same if the original had been sampled at a higher frequency, or even if the original had been a continuous analog signal: if we sample at frame rate f, we can recover the original signal exactly, as long as it contains no energy at frequencies above . A signal like that is called bandwidth limited.

概括

Summary

恭喜!你已经读完了这本书(当然,还有几个练习要做)。在你合上书之前,我想回顾一下我们是如何走到这一步的:

Congratulations! You have reached the end of the book (well, except for a few more exercises). Before you close the book, I want to review how we got here:

  • 我们从周期信号及其频谱开始,我介绍了库中的关键对象thinkdspSignalWaveSpectrum

  • We started with periodic signals and their spectrums, and I introduced the key objects in the thinkdsp library: Signal, Wave, and Spectrum.

  • 我们观察了简单波形和乐器录音的谐波结构,并观察到了混叠效应。

  • We looked at the harmonic structure of simple waveforms and recordings of musical instruments, and we saw the effect of aliasing.

  • 我们利用频谱图,探索了啁啾声和其他频谱随时间变化的声音。

  • Using spectrograms, we explored chirps and other sounds whose spectrum changes over time.

  • 我们生成并分析了噪声信号,并对自然噪声源进行了表征。

  • We generated and analyzed noise signals, and characterized natural sources of noise.

  • 我们使用自相关函数进行音高估计和噪声的进一步表征。

  • We used the autocorrelation function for pitch estimation and additional characterization of noise.

  • 我们学习了离散余弦变换(DCT),它对压缩很有用,也是理解离散傅里叶变换(DFT)的第一步。

  • We learned about the Discrete Cosine Transform (DCT), which is useful for compression and also a step toward understanding the Discrete Fourier Transform (DFT).

  • 我们利用复指数函数合成了复杂的信号,然后反向推导了离散傅里叶变换(DFT)。如果你完成了第7章末尾的练习,你就实现了快速傅里叶变换(FFT),这是20世纪最重要的算法之一。

  • We used complex exponentials to synthesize complex signals, then we inverted the process to develop the DFT. If you did the exercises at the end of Chapter 7, you implemented the Fast Fourier Transform (FFT), one of the most important algorithms of the 20th century.

  • 从平滑开始,我介绍了卷积的定义,并阐述了卷积定理,该定理将时域中的平滑等操作与频域中的滤波器联系起来。

  • Starting with smoothing, I presented the definition of convolution and stated the Convolution Theorem, which relates operations like smoothing in the time domain to filters in the frequency domain.

  • 我们探讨了微分和积分作为线性滤波器的应用,这是求解微分方程的谱方法的基础。它也解释了我们在前几章中看到的一些现象,例如白噪声和布朗噪声之间的关系。

  • We explored differentiation and integration as linear filters, which is the basis of spectral methods for solving differential equations. It also explains some of the effects we saw in previous chapters, like the relationship between white noise and Brownian noise.

  • 我们学习了线性时不变系统理论,并利用卷积定理通过脉冲响应来表征线性时不变系统。

  • We learned about LTI system theory and used the Convolution Theorem to characterize LTI systems by their impulse response.

  • 我介绍了幅度调制(AM),它在无线电通信中非常重要,也是理解采样定理的一个步骤,采样定理是一个令人惊讶的结果,对数字信号处理至关重要。

  • I presented amplitude modulation (AM), which is important in radio communication and also a step toward understanding the Sampling Theorem, a surprising result that is critical for digital signal processing.

如果你已经学到这一步,你应该具备良好的实践知识(如何使用计算工具处理信号和频谱)和理论知识(理解采样和滤波的原理和原因)。

If you got this far, you should have a good balance of practical knowledge (how to work with signals and spectrums using computational tools) and theory (an understanding of how and why sampling and filtering work).

希望你一路玩得开心。谢谢!

I hope you had some fun along the way. Thank you!

练习

Exercises

这些练习的答案在chap11soln.ipynb……

Solutions to these exercises are in chap11soln.ipynb.

练习 11-1。

本章代码在[此处] chap11.ipynb。请阅读代码并聆听示例

The code in this chapter is in chap11.ipynb. Read through it and listen to the examples.

练习 11-2。

Chris “Monty” Montgomery 制作了一段名为“D/A 和 A/D | 数字演示”的精彩视频;它演示了采样定理的实际应用,并介绍了许多其他关于采样的优秀信息。请访问https://www.youtube.com/watch?v=cIQ9IXSUzuM观看。

Chris “Monty” Montgomery has an excellent video called “D/A and A/D | Digital Show and Tell”; it demonstrates the Sampling Theorem in action, and presents lots of other excellent information about sampling. Watch it at https://www.youtube.com/watch?v=cIQ9IXSUzuM.

练习 11-3。

正如我们所见,如果以过低的帧速率对信号进行采样,高于折叠频率的频率就会发生混叠。一旦发生这种情况,就无法再滤除这些分量,因为它们与较低频率无法区分。

As we have seen, if you sample a signal at too low a frame rate, frequencies above the folding frequency get aliased. Once that happens, it is no longer possible to filter out these components, because they are indistinguishable from lower frequencies.

在采样之前,最好先滤除这些频率;用于此目的的低通滤波器称为抗混叠滤波器

It is a good idea to filter out these frequencies before sampling; a low-pass filter used for this purpose is called an anti-aliasing filter.

回到鼓独奏的例子,在采样前应用一个低通滤波器,然后再次应用该低通滤波器以去除采样引入的频谱副本。结果应该与滤波后的信号完全相同。

Returning to the drum solo example, apply a low-pass filter before sampling, then apply the low-pass filter again to remove the spectral copies introduced by sampling. The result should be identical to the filtered signal.

指数

Index

一个

A

B

B

C

C

D

D

E

E

F

F

G

G

H

H

I

J

J

M

M

N

N

P

P

R

S

S

T

T

U

U

V

V

作者简介

About the Author

艾伦·B·唐尼是奥林工程学院的计算机科学教授。他曾在韦尔斯利学院、科尔比学院和加州大学伯克利分校任教。他拥有加州大学伯克利分校计算机科学博士学位,以及麻省理工学院的硕士和学士学位。

Allen B. Downey is a Professor of Computer Science at Olin College of Engineering. He has taught at Wellesley College, Colby College and U.C. Berkeley. He has a Ph.D. in Computer Science from U.C. Berkeley and Master’s and Bachelor’s degrees from MIT.

版权页

Colophon

《Think DSP》杂志封面上的动物是光滑嘴鹃(Crotophaga ani),一种大型鸟类,属于杜鹃科。它分布于佛罗里达州、巴哈马群岛、加勒比海岛屿以及中美洲和南美洲的部分地区。

The animal on the cover of Think DSP is a smooth-billed ani (Crotophaga ani), a large bird that is part of the cuckoo family. It is found in Florida, the Bahamas, Caribbean islands, and parts of Central and South America.

光滑嘴鹬羽毛呈黑色,尾巴长,喙大而有棱纹。它们在地面觅食,食物包括白蚁、昆虫,甚至小型蜥蜴和青蛙。这种鸟偏爱半开阔的栖息地,例如田野和灌木丛交错的区域。随着人类定居和森林砍伐影响了它们的领地,光滑嘴鹬已经适应了新的环境,它们经常光顾农场牧场,以牲畜惊扰出来的昆虫为食。

Smooth-billed anis have black plumage, long tails, and large ridged beaks. They feed on the ground, with a diet made up of termites, insects, and even small lizards and frogs. The birds prefer semi-open habitats with a mix of fields and brushy thickets. As human settlements and deforestation have affected their territory, anis have adapted by frequenting farm pastures and eating the insects flushed out by livestock.

这种鸟类非常群居,总是成群结队地出现,发出嘈杂的声音。交配后的雌雄鸟会与其他几对雌鸟共同筑巢,轮流在树上建造碗状巢穴、孵卵和喂养雏鸟。每只雌鸟产卵4-7枚,但也有发现巢穴中多达29枚卵的记录。

This species is very social and is always found in noisy groups. Mating pairs nest communally with several other couples, taking turns to construct a bowl-shaped nest high in a tree, incubate eggs, and feed the chicks. Each female lays 4–7 eggs, but nests have been found with up to 29 eggs.

O'Reilly 杂志封面上的许多动物都是濒危物种;它们对世界都至关重要。想了解更多您可以如何提供帮助的信息,请访问animals.oreilly.com

Many of the animals on O’Reilly covers are endangered; all of them are important to the world. To learn more about how you can help, go to animals.oreilly.com.

封面图片来自Braukhaus Lexicon。封面字体为 URW Typewriter 和 Guardian Sans。正文字体为 Adob​​e Minion Pro;标题字体为 Adob​​e Myriad Condensed;代码字体为 Dalton Maag 设计的 Ubuntu Mono。

The cover image is from the Braukhaus Lexicon. The cover fonts are URW Typewriter and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag’s Ubuntu Mono.